Inverse Optimization for the Recovery of Market Structure from Market Outcomes:
by user
Comments
Transcript
Inverse Optimization for the Recovery of Market Structure from Market Outcomes:
Inverse Optimization for the Recovery of Market Structure from Market Outcomes: An Application to the MISO Electricity Market Draft Version John Birge, Ali Hortaçsu, & Michael Pavlin June 26, 2014 Abstract We propose an inverse optimization based methodology to determine market structure from the locational pricing of a commodity. The methodology requires that the market optimally allocates goods and that locational prices correspond to shadow prices of this optimization problem. As a case-in-point, we study locational marginal price based electricity markets where prices are determined using the results of a centralized optimization for clearing the market. We apply the inverse optimization methodology to outcome data from the Midcontinent ISO electricity market and uncover transmission constraints explaining the price variation. We also discuss analytical applications such as identifying missing data and the residual demand derivative. To broaden the scope of applications, assumptions sucient to justify the methodology for competitive markets are described. 1 Introduction In electricity and many other commodity markets, price varies substantially depending on location. For example, during the month of October 2012 in the Midcontinent ISO (MISO) day-ahead electricity market, the average hourly price was $25.50 but the average hourly dierence between the high and low price was $81.28. The two panels in Figure 1 illustrate both the price variation at a particular time (histogram in Panel (a)) and the price volatility over time for each market participant as dierent active transmission constraints change the market conditions (time-series plot in Panel(b)). Analyses of these markets would benet from a structural model explaining price formation. However, building this model requires a precise 1 400 10 200 100 Frequency 300 distance from hourly mean ($) 5 unit u3726 u3956 0 u6661 u4424 u6766 0 −5 −20 0 20 40 60 −10 80 0 Day Ahead LMP 2012−10−13 3:00:00 (a) Histogram of node prices. 25 50 75 100 hours since 2012−10−01 12:00:00 (b) Time-series of deviation of price received from mean price for 5 generators. Figure 1: Price variation in Midcontinent ISO day-ahead market. model of the frictions generating these pricing disparities. In particular, to understand how electricity prices are formed, we need to know how local generation contributes to congestion throughout the market footprint. These operational details are typically private information.1 However, accurate and detailed pricing data is often readily available. In the case of electricity markets which use the dominant locational marginal pricing scheme, the allocation and prices correspond to the value of primal and dual variables in the economic dispatch problem. The economic dispatch problem is typically a linear program which determines the optimal allocation of production and consumption given market bids. The correspondance between market outcomes and the optimal solution of a partially specied optimization program promotes an inverse optimization based methodology. We propose a novel inverse optimization based recovery algorithm where unobservable market structure, which corresponds to parameters of the feasible region of the economic dispatch problem, is uncovered using observed prices and allocations. Inverse optimization may be dened as retrieving parameters of incompletely specied mathematical programming formulation from a partially observable optimal solution. The standard inverse optimization problem discussed in Zhang and Liu [1996] and elsewhere is to retrieve unknown parameters of the objective function given an optimal solution. Our problem is novel; we are searching for parameters dening the feasible space of possible allocations. Using knowledge of the economic dispatch problem, we are able to 1 In cases such as electricity and natural gas, detailed information is unavailable due to the Critical Infrastructure Information Act of 2002. 2 prespecify a particular form to the optimization; we also take advantage of a partial solution to the dual problem. Like Zhang and Liu [1996], we leverage necessary conditions for optimality to rst constrain the missing parameters and motivate an algorithm to uncover them. We present the methodology in the context of an electricity market using locational marginal prices (LMP) and present a numerical example. An LMP is a price at a particular point in the network and may be dened as the cost of purveying one additional megawatt/hour (MWh) of energy at that particular point. LMPs reect behavior of neighboring and far ung participants in the network. LMP based market design has largely come to dominate the distribution of electricity in North America where about 60% of total production is in markets using LMPs [Sahni et al., 2012]. The testing of our methodology will focus on the Midcontinent Independent System Operator (MISO) which is a LMP based market spanning eleven states and the Canadian province of Manitoba. Other LMP based markets include the California CAISO market, the Texas ERCOT market, and the PJM Interconnection, which is the largest North American market and serves over 60 million customers along the eastern seaboard. From an analytical standpoint there are several benets to acquiring a structural model for these important markets. First, estimation of hidden variables such as local market power may be performed without the endogeneity problems which plague standard econometric techniques. Second, counterfactuals may be evaluated. For instance, an accurate model of the pricing mechanism in an electricity market allows estimation of the value of new production capacity investment which takes into account the impact of market power on revenues. An ancillary benet of the inverse optimization methodology is that identication of the market structure can be performed with minimal dataidentication requires only one more data sample than there are active constraints. While we develop this methodology in the context of electricity markets, there is the potential to apply inverse optimization based market structure recovery whenever two structural assumptions hold: the market results in an ecient allocation and pricing corresponds to the value of dual variables of the ecient allocation problem. We discuss in Section 5.1 how this methodology may be extended to competitive markets for commodities other than electricity when capacity issues are the root cause of spatial price variation. These markets may include oil and gas and other commodities where transportation channels are inexible. In such markets, transactions can often not be completely observed by regulators and the methodology may be useful to regulators as well as market participants. The methodology can provide a means for regulators to make inferences about market conditions and to perform counterfactual policy analysis. As a proof of concept, we apply the inverse optimization methodology to real data from the MISO dayahead electricity market. We take advantage of high granularity data which includes precise prices and allocations throughout the market footprint as well as prices for transmission lines to derive the market 3 structure. Using the derived market structure and also published bid functions for all market participants, prices and allocations may be predicted and compared to the real market outcomes. The results show that the methodology is eective at recovering hidden parameters determining how transmission resources are utilized and the relative losses from dierent areas in the market. When these parameters are used to reconstruct prices, they are very eective in explaining price variation. In many cases we nd near perfect correlation between predicted and actual prices. However, the methodology results in somewhat larger absolute price error of about 12%. These errors appear to be systemic eects likely due to limitations in the data, such as missing supply and demand data. For instance, import and export data is not available at high granularity. In Section 5.3 we discuss how the methodology may be leveraged as an analytical tool for detecting and identifying unobservable supply and demand. From an operational point of view the most important benet of acquiring this structural information is likely the ability to evaluate local market power. Standard market concentration indices such as the Herndahl-Hirschman Index are of limited value in electricity markets because transmission constraints partially isolate submarkets resulting in frequent changes in market structure [c.f. Karthikeyan et al., 2013, Borenstein and Bushnell, 1999]. A more principled approach, outlined in Lee et al. [2010] and Lee et al. [2011] may be of more value to both participating rm and market observer alike. These papers propose using the transmission constrained residual demand derivative which provides a precise picture of the incentives of individual rms to deviate from competitive market bids into the electricity market. An ecient calculation of this quantity for a lossless electricity market is given in Xu and Baldick [2007] and relies on bids and the coecients describing the contributions of market participants to active congestion constraints. We adapt this calculation to a market with transmission line losses in Section 5 and show substantial variation in the distribution of market power in real empirical data from the MISO market. Following a discussion of the relationship of this work to the operations research, energy, and economics literatures, we present the electricity market model and ecient allocation problem (Section 2). In Section 3 we present the recovery methodology. We then discuss the application of this methodology to electricity markets and the MISO market in particular in Section 4. In Section 5 we consider the broader context by discussing missing data, competitive markets, and several applications. First, we show that the methodology may be applied to competitive markets with capacitated shared resources. The analytical applications include demonstrating a simple methodology for uncovering unobservable supply and discussion of how the structural model can be used to evaluate market power and the value of capacity investments. 4 1.1 Literature and Positioning This paper promotes the analysis of spatial price variation using tools from optimization theory. We provide a specic application to electricity markets which use locational marginal prices. Locational marginal prices assign a price at each location in the market which corresponds to the marginal cost of supplying that particular location. Hogan et al. [1996] rst proposed the use of mathematical programming to determine electricity prices; since then, locational marginal prices have come to dominate North American electricity markets. While there is some evidence that the locational prices are succeeding at improving market eciency [c.f. Wolak, 2011], there are potential drawbacks such as increased market power as discussed in Cardell et al. [1997]. As identied by Sahni et al. [2012], locational priced markets also present much higher analytical hurdles both for participants and observers which a methodology like ours may contribute to mitigating. The methodology we introduce to uncover the market structure is a type of inverse optimization. Inverse optimization algorithms uncover unknown parameters of a mathematical programming formulation from observations of the optimal solution. Since the important paper of Zhang and Liu [1996], a breadth of research has focussed on the following inference problem: given an observable feasible space and an optimal solution to the optimization problem, determine unobservable parameters of the objective function. This problem has been addressed for both linear programming problems [Yang and Zhang, 2007, Ahuja and Orlin, 2001, Zhang and Liu, 1996, 1999] as well as various non-linear mathematical programming paradigms [Schaefer, 2009, Wang, 2009, Iyengar and Kang, 2005]. Our problem diers particularly in that the unobservable parameters of interest dene the feasible region and we assume that relevant parts of both the primal (allocation) and dual (prices) solutions are observable. In the standard inverse optimization problem, the solution is not fully identied. The common practice is to search for norm minimizing values for the missing parameters. In our case, the solution is fully identied and with sucient data may be determined by solving a system of linear equations. Several clear themes link this work to studies in the economic literature which examine failure of the `law of one price' in regional trade. Pioneered by the works of Engel and Rogers [1996] and [Mccallum, 1995], these studies have focussed on the goal of understanding how distance and border eects contribute to price dierences for physically identical goods. Recent treatments of the problem have incorporated sophisticated models of market power [Cosar et al., 2012] and have tested these results with non-aggregated microdata [Broda and Weinstein, 2008]. The work is characterized by a common underlying model of the contribution of distribution infrastructure to variation in prices: transportation frictions (i.e., borders and distance) are separable, observable, and contribute to price variation in a manner which is consistent across samples. This model may be appropriate for prices of hard goods such as wind turbines as studied in Cosar et al. 5 [2012]. However, the electricity produced by these turbines diers in a fundamental manner: the preferred and lowest cost distribution method is via a capacitated technology. Adding new transmission lines requires fullling time consuming regulatory requirements and providing large capital investments. Capacity issues which make it more dicult to close supply-demand gaps are compounded by seasonal demand uctuations and the inability to relocate production. Natural resources are tied to a specic geography. This applies to natural gas and oil as well as renewable electricity fuels such as hydro and wind. Which transportation link is at capacity varies sample to sample; as a result, these frictions contribute in dierent manners to the overall price. In electricity markets, the variation in eects is particularly acute. For example in the MISO market, in 2012, 1295 unique transmission linesthe relevant resourcewere at capacity. Of these 89% were at capacity in no more than 1% of the hours and no constraint was active in more than 30% of the hours. Ecient use of data is important for deriving estimates from multiple market states. Our results show that identication of the pricing mechanism requires only one more sample than there are links at capacity. In natural gas and other energy markets, theoretical models have been proposed which incorporate capacitated technologies into equilibrium models as explanatory of geographic price dierences [Mudrageda and Murphy, 2008, Cremer et al., 2003]. We believe that in addition to applications to electricity markets, this paper may be taken more generally as an extention of this line of theoretical research equating price variation to market wide capacity constraints. In Section 5 we discuss how our methodology may be applied to the inference of market frictions when a centralized institutional structure is lacking. 2 Model and Preliminary Analysis We focus on a one-shot electricity market facilitated by a centralized market maker. In North American electricity markets, the facilitator would be either an independent service operator (ISO) or regional transmission organization (RTO). The model is appropriate to either forward (day-ahead) or spot (real-time) energy markets. In both of these markets, the participants submit bids while the facilitator solves an ecient allocation problem under the assumption that these bids accurately reect the participants' preferences, i.e., if the bids are honest, the facilitator maximizes social welfare. A bid from a participant consists of a supply function and bounds on the interval they are willing to supply. The facilitator also takes into account how nodes are connected by transmission resources, the capacity of these transmission resources, and other characteristics determining how electricity ows through the network. The market features a set of generators. Load serving entities are located at buses connected by transmission lines. Generators and load serving entities correspond to the set of market participants; buses correspond 6 to locations. The standard model of line load used by electricity markets is a linear approximation of the true AC power ows referred to as the DC model. Details of the linearization are discussed in Appendix A. In the linear model, the net production at a particular bus will have a proportional aect on ow over a particular line. In eect, the transmission lines are shared transmission resources whose utilizations are determined by locational parameters. Other than increasing and decreasing production and consumption at particular locations in the network, the facilitator and participants are not able to inuence the routing of electricity as they might be able in other network settings such as a data network. 2.1 Market Model The initial model we present is specialized to electricty markets. In Section 5.1, we discuss application of the model and methodology to other commodity markets. The electricity market model consists of a set of participants who produce or consume electricity which is distributed through shared transmission resources. Participants: The market features a set of participants Q = {1..Q} where participant j produces xj MWh of electricty. Consumption is interpreted as negative production, i.e., xj < 0. Transmission Resources: The market is connected via a set of transmission links, R = {1..R}. Each of these links may be interpreted as a shared resource with a xed capacity. Link r has capacity dr . This endowment is utilized heterogenously by participants when they produce or consume electricity. Each participant produces or consumes electricity at one of a set of possible locations, L = {1..T }. Variable bi denotes the net production at location i. The location determines the rate at which a participant utilizes each resource. In particular, for each location i and resource r, there is a parameter Dri , such that, if bi units are produced at location i, then Dri bi units of resource r are consumed. Allowing b to be the vector of net production for each location, b is feasible if Db ≤ d. We let l(j) be the location of participant j and P q(i) be the set of participants operating at location i. So, net production at location i is bi = j∈q(i) xj . In the context of an electricity market, a location would be equivalent to an electricity bus though alternative interpretations are possible in other settings. Transmission lines are imperfect and some proportion of production is lost in the form of heat. We use an ane model to accomodate this line loss which associates a parameter δi equal to the proportion of production lost from each location. This notion of loss is compatible with the MISO electricity market where a similar loss factor is associated with each producer [Ma et al., 2009a]. The total system production must P P equal total consumption plus losses: i∈L bi + i∈L δi bi = 0 . Note that participants that are co-located may trade electricity amongst themselves without loss or impact of transmission capacities; however, positive 7 or negative net production at each location may impact loss from the network and consumes transmission resources. Market Mechanism: Production and consumption bids are described by a real valued supply function Sj to each participant j . The function Sj is dened over an interval [`j , uj ], where Sj (xj ) is the announced total cost to participant j of producing xj MWh. In the case of consumption, xj < 0, Sj (xj ) is the announced negative of the value to participant j of consuming −xj MWh. While participants will typically behave as either producer `j ≥ 0 or consumer uj ≤ 0, there is nothing which precludes participants from behaving as both. For instance, the owner of a partially charged battery operating in an electricity market may wish to P either buy or sell electricity depending on price. The total system welfare is − j∈Q Sj (xj ). The facilitator solves the ecient allocation program which determines the ecient production and consumption from each participant under the assumption that bids are an accurate reection of true preferences: min x X (1) Sj (xj ) j∈Q s.t bi = X ∀i ∈ L, xj (2) j∈q(i) X (1 + δi )bi = 0, (3) i∈L Db ≤ d, (4) x ≥ `, (5) x ≤ u. (6) In the above model, prices may be assigned to each location. The ecient allocation program has linear constraints but typically also has quadratic relationships and integer variables due to non-linear supply functions. The linearized version that we focus on here is, however, still used for computing prices as the marginal impacts from changing constraints. 2.2 Preliminary Analysis: Program 1-6 implies a series of optimality conditions gleaned from standard constrained optimization theory (see, e.g., Karush-Kuhn-Tucker conditions in Nocedal and Wright [1999]). Consider dual variables π, φ, ρ, α, γ corresponding to constraints 2-6. We refer to the vector −π as the locational prices and the vector −ρ as the resource prices. Then, the necessary conditions for optimality are: 8 Stationarity: X Dki ρk + πi + (1 + δi )φ = 0, Primal feasibility: ∀i ∈ B, X bi = k xj , ∀i ∈ L., j∈q(i) (7) −πl(j) + α − γ = S 0 (xj ), (9) X j ∈ P. (8) Dual feasibility: ρ, α, γ ≤ 0. (1 + δi )bi = 0, (10) Db ≤ d, (11) x ≥ `, (12) x ≤ u. (13) i∈L Complementary slackness: (14) ρ(Db − d) = 0, (15) γ(x − `) = 0, (16) α(u − x) = 0. (17) Note that these are necessary conditions for optimality under very general continuity condtions [c.f. Nocedal and Wright, 1999] and will hold even with piecewise and non-convex supply functions with the corresponding linearization. With the supply equilibrium constraint in Equation 3, the program is overspecied. To avoid this problem, it is standard in electrical engineering to select a reference location r which is assumed to not aect ow on the lines. The rth column of matrix D, which contains the parameters for the utilization of each resource, corresponds to the reference location and is a zero vector; the parameters for other locations will compensate. These is a dierent matrix D for each reference location though the optimization problem remains equivalent. The following observations allows the unobserved dual variable φ to be removed. Proposition 1. In an optimal solution of Program 1,D and φ have the following properties: 1. Buses with zero coecients in all active D constraints have the same lmp. 2. φ is equal to the lmp of the reference location corrected by the loss. φ= 9 −πr . (1 + δr ) (18) 3 Recovery of Market Structure from Outcomes In this section, we discuss the identication of utilization parameters for resources which are scarce in at least one data sample, i.e., the link is at full capacity and has a non-zero resource price. Since resource prices are identically equal to zero for any transmission resource with surplus capacity, parameters for these resources cannot be identied; hence, we exclude these resources from the recovery procedure. We refer to resources with a positive resource price as active resources. By convention, each row of D corresponds to a resource which is active in at least one sample. Consider a set of noise-free data samples S = {1..s}, where a sample j = {−π j , −ρj , x} species observed commodity, resource prices, and allocations. Let K be the set of active resources in the data samples. The identity φ = −πr (1+δr ) from Proposition1 allows φ to be replaced in Equation 7. Rather than determining δi , the relative productivity of a location i to the reference location, qi = 1+δi 1+δr , will be explored. With the relative productivity vector q , an equivalent program to the Ecient Allocation program can be formulated P by replacing the constraint i∈L (1 + δi )bi = 0 with the equation: X (19) qi bi = 0. i∈L The following theorem details how and when the unknown parameters are identied. Theorem 2. For an arbitrarily selected reference location r, consider equations: X Dki ρjk = qi πrj − πij , ∀i ∈ L, j ∈ S, (20) ∀i ∈ L, j ∈ S, (21) k∈K X xjp = bji , p∈q(i) X (22) qi bi = 0; i∈L then, D and q are identied if the above system of equations has rank (|K| + 1)|L|. The remaining unknown parameters of the primal program, d, may be recovered from Dk bj = dk . Proof. Equation 20 is simply Equation 7 updated with φ = qi πr as stated in Proposition 1. The system of equations contains (|K| + 1)|L| variables which are the elements of matrix D and q , requiring the system to have this rank for a unique solution. Dk bj = dk follows directly from the primal formulation. Dk is identied as just stated, and bj is specied from data allowing d to be identied. 10 Corollary 3. Consider a set of resource constraints K active in a set of samples S . Then, consider the bipartite constraint graph G dened by nodes K ∪ S and edges (k, s) if constraint k is active in sample s. The coecients D for constraints K and the relative productivities q are identied if and only if for each connected component in G with constraints Kc and samples S c , the rank of X Dki ρjk = qi πrj − πij , ∀i ∈ L, j ∈ S c , (23) k∈K is greater or equal to |Kc ||L| and, for one connected component, the rank of these equations is greater or equal to (|Kc | + 1)|L|. Estimation: Error may be incorporated into this model with the additive residual for each of Equations 20. Assuming a normal error distribution and that each equation has error with zero mean and common variance, the least-squares solution is an appropriate estimator when the problem is overspecied (rank is greater than (|K| + 1)|L|). We use this single step estimator in the application section. Through the remainder of the text, we refer to this estimation procedure as the IO recovery algorithm. 3.1 Numerical Example The example is illustrated in Figure 2. There are four locations (buses), L = {1..4}, connected into a single cycle by lines {1 − 2, 2 − 3, 3 − 4, 1 − 4}. We assume no transmission loss (i.e., q = [1, 1, 1, 1]). Flow may go in either direction on these links (i.e., y may be negative). The links have reactance parameters z = 1 except for the link 1-4 where z = 2. The topology and reactances contribute to the matrix D. The precise derivation is described in Appendix A. There is a single capacity constraint on line 2-3 which limits the absolute ow to 50MW. The positive and negative ows over lines 1 − 2, 2 − 3, 3 − 4, and 1 − 4 are respectively in rows 1,5; 2,6; 3,7 and 4,8 with d containing the corresponding limits. As discussed, only capacity constraints that are active in a sample are relevant; so, we may ignore resources with innite capacity. In this market, there is a single price-taking demand participant at Bus 4 requiring 300MW. This may be supplied from a low price producer at Location 1 or a high price producer at Location 3. Location is important in determining resource utilizations. If 100 additional MW are requested at Location 4, depending on whether it is supplied from Location 1 or Location 3, ow over the transmission line 2-3 will either increase by 40MW or decrease by 20MW. The following shows that this can dramatically aect prices throughout the network. Table 1, describes two bidding scenario which are identical except for demand at Location 4. Scenario 1 11 Participant 1 Min:0 MW Max:500 MW Cost:$20/MWh z=1 1 2 Limit: 50MW z=2 Participant 4 Min:-300 MW Max:-300 MW Participant 2 Min:0 MW Max:0 MW z=1 z=1 4 3 Participant 3 Min:0 MW Max:500 MW Cost:$30/MWh Figure 2: Electricity Network in Numerical Example for Sample 2. Participant 1 3 4 Scenario 1 S1 (x1 ) = 20x1 , `1 = 0, u1 = 500 S3 (x3 ) = 30x3 , `3 = 0, u3 = 500 S4 (x4 ) = 0, `3 = 100, u3 = 100 Scenario 2 S1 (x1 ) = 20x1 , `1 = 0, u1 = 500 S3 (x3 ) = 30x3 , `3 = 0, u3 = 500 S4 (x4 ) = 0, `3 = 300, u3 = 300 Table 1: Bidding scenarios for the numerical example is the low demand scenario; Scenario 2 is the high demand scenario. We will illustrate the ecient allocation program and use the outputs as data samples to demonstrate the IO recovery algorithm. The optimal ow problem for Scenario 1 is: max20x1 + 30x2 (24) s.t. x = b, (25) b1 + b2 + b3 + b4 = 0, (26) 0.4b1 + 0.6b2 − 0.2b3 ≤ 50, (27) −0.4b1 − 0.6b2 + 0.2b3 ≤ 50, (28) x ≥ [0, 0, 0, −100], (29) x ≤ [500, 0, 500, −100]. (30) 12 Solving this program results in: x1 =[100, 0, 0, −100], −π 1 =[20, 20, 20, 20], −ρ1 =[0], φ1 =20. In Scenario 2, the production bounds in ecient allocation program change to: x ≥ [0, 0, 0, −300], (31) x ≤ [500, 0, 500, −300]. (32) Solving the program for Scenario 2 results in: x2 =[183.3, 0, 116.7, −300], −π 2 =[20, 16.7, 30, 26.7], −ρ2 =[16.7], φ2 =26.7. In Scenario 2, transmission resource 2-3 is at capacity resulting in price variation throughout the network. φ2 is equal to the price at bus 4, the reference bus, in this lossless system. Switching to the IO recovery algorithm, assume that there are two data samples (x1 , −π 1 , −ρ1 ) and (x2 , −π 2 , −ρ2 ). 13 Equations 7 provide: −D11 20 − 20 + φ1 = 0, −D12 20 − 20 + φ1 = 0, −D13 20 − 20 + φ1 = 0, −D14 20 − 20 + φ1 = 0, −D11 16.7 − 20 + φ2 = 0, −D12 16.7 − 16.7 + φ2 = 0, −D13 16.7 − 30 + φ2 = 0, −D14 16.7 − 26.7 + φ2 = 0. The next step is to select the reference location and substitute −qi πrs for φs in each constraint after which the system of equations can be solved. Selecting r = 4 recovers the original D = [0.4, 0.6, −0.2, 0] and q = [1, 1, 1, 1]. An alternative selection such as r = 1 results in D = [0, 0.2, −0.6, −0.4] which is equivalent P once b∈L b = 0 is taken into account. 4 Application to the MISO Electricity Market MISO provides day-ahead energy, real-time energy and ancillary services markets for 11 American states and a Canadian province. We perform a case study applying the IO recovery algorithm to the MISO day-ahead energy market in March of 2011. This market uses locational marginal pricing and ts the model described in Section 2.1. In the day-ahead energy market participants include generators (supply), load-serving entities (demand), and strictly nancial players who participate on both sides of the market. We will continue to refer to market submissions on both sides of the market as bids though MISO refers to supply and demand bids respectively as oers and bids. Regardless of type, participants are treated simlarly and submit marginal cost functions as non-decreasing piecewise linear functions. In the case of generators, these may have up to 10 regions and may be either step functions or continuous. For other participants, only a step function with up to nine components may be submitted. The resulting supply functions, which return the total cost, are either piecewise linear or piecewise quadratic such that the Ecient Allocation program is a mixed integer quadratic program (QIP). 14 4.1 Empirics The goal of the emipirical section is to show that in the context of a real market with noisy data, the IO recovery algorithm generates informative parameters regarding the relative utilization of key resources and the relative loss throughout the network. To show this result, we rst recover the parameters and then test their accuracy by using them to generate new prices from bids and by comparing those prices to the prices generated by MISO for that same set of bids. We focus our attention on several cases taken from the day ahead market in March of 2011. For each of these examples we do the following: 1. Recovery: Using the IO recovery algorithm recover constraints K using a set of identifying samples SK . 2. In-sample testing: (IS) Take bids and oers for s ∈ SK determine prices using Program 1 and the estimated constraints. 3. Out-of-sample testing: (OS) Take bids and oers for s ∈ / SK where active constraints are a subset of K and determine prices using Program 1. 4. Well-out-of-sample testing: (WOS) Take bids and oers for s ∈ / SK where active constraints are not a subset of K and determine prices using Program 1. 5. Comparison to benchmarks: To test the model quality, we compare the accuracy to prices generated with two benchmark algorithms. The rst benchmark (NC) is a model where we assume that there is no congestion, performing an unconstrained uniform price auction with the observed bids. The second benchmark (UC) produces prices using a naive data-centric approach: assume prices remain unchanged from the rst sample. We have selected three cases (Test1-Test3) from March of 2011 where respectively 1, 2, and 3 unique constraints are active and identied. For each case, |K| + 1 IS samples are selected and in all cases yield a full rank system of linear equations. OS and WOS samples were also selected from March of 2011. For each example, three OS samples were selected closest in date to the example IS. WOS samples are common to all examples. The four WOS samples were randomly selected satisfying the conditions that (1) they occured in March of 2011, (2) the active set is not a subset of the example active sets, and (3) they respectively have 1, 3, 5, and 7 active constraints. All tests have the same WOS set. The hours and active constraints are reported in Table 2. The procedure also assumes that we have an accurate map from participant to location. This was accomplished by clustering participants and locations with persistent identical prices. They are assumed to remain constant over the course of the study. 15 ID Hour Test1: Active Resources ID Hour Active Resources Test2: In sample In sample IS-1-0 2011-03-26 23:00:00 680 IS-2-0 2011-03-26 23:00:00 680 IS-1-1 2011-03-26 02:00:00 680 IS-2-1 2011-03-27 00:00:00 680:1473 IS-2-2 2011-03-26 02:00:00 680 Out of sample Out of sample OS-1-1 2011-03-26 03:00:00 680 OS-2-1 2011-03-26 03:00:00 680 OS-1-2 2011-03-27 11:00:00 680 OS-2-2 2011-03-27 01:00:00 680:1473 IS-3-0 2011-03-18 12:00:00 100:1526:1530 WOS-?-0 2011-03-18 21:00:00 1531 IS-3-1 2011-03-18 13:00:00 100:1526:1530 WOS-?-1 2011-03-20 22:00:00 100:680:1530 IS-3-2 2011-03-18 14:00:00 100:1526:1530 WOS-?-2 2011-03-13 11:00:00 3:307:1257:1496:1504 IS-3-3 2011-03-18 18:00:00 100:1526:1530 WOS-?-3 2011-03-20 14:00:00 100:307:1329:1358:1473:1530:1537 OS-3-1 2011-03-18 19:00:00 100:1530 OS-3-2 2011-03-18 20:00:00 100:1526:1530 Test3: All Tests: In sample Well out of sample Out of sample Table 2: Samples for each test case Determining Market Outcomes: To determine the predicted LMPs and MW allocations, one may solve the QIP using the complete supply function bids in Program 1. Then, x the binary variables and resolve the model to determine the shadow prices and associated LMPs. This would correspond to the sequence of operations for MISO which rst solves the Unit Commitment problem and then solves the continuous Economic Dispatch problem to determine prices. This process is computationally challenging due to the large number of binary variables. To speed up the process we essentially allow MISO to solve the Unit Commitment by taking advantage of our knowledge of the true allocation values and force the MW values to be in the observed active segment of their supply function. We then solve what is an estimated version of the ecient allocation program to determine prices and the precise allocation. Comparison with the network when there are no transmission constraints illuminates the degree to which the transmission structure is impacting these outcomes. 4.2 Results Results of applying the IO pricing mechanism are summarized in Table 3. These may be contrasted with the NC and UC benchmarks reported in Tables 4 and 5. The measures we consider are the correlation of predicted LMPs with true LMPs, the average absolute error between the predicted and true LMPs, and the squared error associated with the linear regression lmptrue = a · lmppred + b . We also report on the same values when the outliers are removed. An outlier is dened as the 3% of locations with maximum absolute LMP error. To summarize Table 3, we nd that the prices generated using the derived constraints are very highly 16 ID All locations and participants corr-coef IS-1-0 IS-1-1 OS-1-0 OS-1-1 WOS-1-0 WOS-1-1 WOS-1-2 WOS-1-3 IS-2-0 IS-2-1 IS-2-2 OS-2-0 OS-2-1 WOS-2-0 WOS-2-1 WOS-2-2 WOS-2-3 IS-3-0 IS-3-1 IS-3-2 IS-3-3 OS-3-0 OS-3-1 WOS-3-0 WOS-3-1 WOS-3-2 WOS-3-3 1.000 0.999 1.000 0.951 0.723 0.990 0.349 0.974 1.000 0.470 0.993 0.842 0.456 0.723 0.988 0.350 0.974 0.416 0.567 0.399 0.384 0.157 0.210 0.140 0.744 0.314 0.581 std-reg-err 0.020 0.042 0.026 7.023 9.832 0.334 16.542 1.048 0.010 107.298 0.311 11.519 109.181 9.830 0.338 16.432 0.926 687.254 180.706 669.760 4 119.915 26 105.651 28 562.426 9 407.944 21.780 205.694 105.566 avg-abs-err 2.522 1.606 0.976 7.555 5.909 2.471 13.150 1.605 2.672 2.391 1.747 1.156 2.126 5.910 2.379 13.159 1.783 16.285 10.037 7.053 12.192 27.242 23.414 14.741 2.684 14.314 2.348 Outliers removed mw avg-abs-err 12.588 10.778 10.363 14.457 16.013 9.473 16.522 8.022 12.703 11.789 10.841 10.157 12.482 15.706 9.505 16.861 8.065 14.777 16.094 14.075 14.216 13.372 13.661 16.742 10.480 18.432 8.744 corr-coef 1.000 0.999 1.000 0.951 0.744 0.991 0.392 0.982 1.000 0.987 0.999 0.996 0.993 0.744 0.989 0.392 0.982 0.978 0.978 0.988 0.971 0.946 0.895 0.710 0.984 0.369 0.987 std-reg-err 0.015 0.034 0.020 6.292 8.487 0.290 15.433 0.705 0.007 0.844 0.041 0.188 0.436 8.486 0.290 15.329 0.635 1.796 1.589 0.764 2.039 8.151 11.000 17.789 0.765 17.289 0.818 avg-abs-err 2.501 1.564 0.929 7.169 5.679 2.301 12.860 1.465 2.662 2.075 1.690 1.008 1.830 5.679 2.192 12.870 1.641 13.296 8.443 3.830 5.373 12.121 7.694 5.303 2.092 12.883 1.035 mw avg-abs-err 4.001 4.157 4.044 3.727 4.693 4.125 4.559 3.451 4.076 3.734 4.265 4.064 4.238 4.520 4.138 4.651 3.503 4.246 4.360 3.851 3.863 3.491 3.625 5.114 4.196 5.535 3.501 Table 3: Price and MW predictions from using IO recovery algorithm correlated with the true prices and error can largely be removed through a linear shift. In the case of Tests 2 and 3, there are a small number of outliers which degrade the summary statistics. When outliers are removed the delity of both Test 2 and 3 improves dramatically. For example, for IS-3-0, removing the outliers improves correlation from 0.416 to 0.978 and the standard error is reduced from 687.254 to 1.796. We expand in Section on the distribution of error in our detailed discussion of Test 3 below. Below, we will discuss in depth how this error is systemic and appears to reect missing generation data in a congested region of the network. In Section 5 we discuss how the recovered model may be used to infer the magnitude and location of the unobservable data. Table 4 shows the same outcome of the UC benchmark when transmission constraints are removed from the ecient allocation problem. Price dierences produced in this model are then exclusively a result of the recovered loss coecients. Without the transmission constraints, the correlation between predicted and observed prices decreases for in-sample and out-of-sample tests with the exception of Test 3. The UC benchmark does not produce systemic outliers; removing outliers produces little improvement. Excluding outliers, IO recovery algorithm performs better than the UC benchmark with respect to average absolute pricing error on all instances and with respect to correlation on all but four instances. The four instances which show higher correlation when the UC benchmark is used are well-out-of-sample tests. This may be explained by loss parameters but not transmission constraints being consistent across the in-sample and well-out-of-sample tests. 17 ID All locations and participants corr-coef IS-1-0 IS-1-1 OS-1-0 OS-1-1 WOS-1-0 WOS-1-1 WOS-1-2 WOS-1-3 IS-2-0 IS-2-1 IS-2-2 OS-2-0 OS-2-1 WOS-2-0 WOS-2-1 WOS-2-2 WOS-2-3 IS-3-0 IS-3-1 IS-3-2 IS-3-3 OS-3-0 OS-3-1 WOS-3-0 WOS-3-1 WOS-3-2 WOS-3-3 0.893 0.872 0.871 0.947 0.839 0.854 0.470 0.891 0.893 0.900 0.872 0.871 0.897 0.839 0.854 0.470 0.891 0.749 0.761 0.719 0.749 0.714 0.812 0.856 0.552 0.614 0.612 std-reg-err 0.386 0.456 0.460 0.200 0.563 0.512 1.488 0.389 0.386 0.362 0.456 0.460 0.371 0.563 0.512 1.484 0.389 0.407 0.389 0.446 0.408 0.455 0.316 0.247 0.638 0.578 0.574 avg-abs-err 5.891 5.578 5.414 12.449 8.112 4.512 15.192 4.778 5.891 5.992 5.578 5.414 5.506 8.112 4.512 15.221 4.778 18.133 12.337 8.340 9.486 17.475 11.629 8.813 5.306 15.922 5.613 Outliers removed mw avg-abs-err 20.187 16.412 16.445 26.323 21.059 13.475 21.887 12.025 20.057 18.213 16.294 16.331 18.471 20.922 13.351 21.808 11.905 23.058 22.299 21.076 20.565 21.693 20.557 23.029 15.660 24.198 14.142 corr-coef 0.894 0.892 0.897 0.944 0.839 0.890 0.482 0.925 0.894 0.899 0.892 0.897 0.894 0.839 0.890 0.482 0.925 0.728 0.737 0.703 0.725 0.686 0.785 0.863 0.579 0.604 0.632 std-reg-err 0.362 0.383 0.371 0.196 0.545 0.402 1.411 0.278 0.362 0.345 0.383 0.371 0.357 0.545 0.402 1.407 0.278 0.354 0.342 0.402 0.373 0.400 0.288 0.204 0.627 0.554 0.554 avg-abs-err 5.741 5.436 5.272 12.338 8.036 4.262 14.958 4.600 5.741 5.844 5.436 5.272 5.354 8.036 4.262 14.987 4.600 17.948 12.190 8.197 9.231 17.228 11.486 8.736 5.088 15.686 5.462 mw avg-abs-err 5.622 5.451 5.409 7.242 5.857 4.929 6.151 4.216 5.622 5.183 5.451 5.409 5.686 5.857 4.929 6.158 4.196 5.800 5.312 4.925 4.831 4.956 4.903 6.179 5.105 6.784 4.337 Table 4: Benchmark 1 NC: no transmission constraints Table 4 shows the same outcome of the NC benchmark when the rst hour's price and allocation are used to predict prices in the other hours. The hours used for the prediction are respectively IS-1-0, IS-2-0, and IS-3-0. Broadly, this predictor does not suer from large outliers and as a result does well on the in-sample and out-of-sample tests. The IO mechanism is competitive with the NC on these IS and OS tests and of course uses recovered market structure as opposed to market outputs to construct prices. Figures 3 and 4 show the relationship between predicted and true prices for in-sample and out-of-sample conditions for Tests 1 and 2. In both cases there is high correlation between the predicted and true prices. In the case of Test 2, Figure 4-b shows that error in the out-of-sample test largely results from several outliers which are priced very high and are the major contributors to the error. Figure 5 compares recovered coecients for constraints 680 and 1473 using dierent sets of samples to generate them. A log scale is used on the graph which accentuates error in the smaller coecients. There is strong correlation between recovered coecients, particularly at higher coecient values (top right corner of the plots). The prices output for Test 3 are shown in Figure 6. Unlike the other examples, in this case there is a set of locations which appear to be systematically overpriced by the model. In the case of Test 3 the demand is larger resulting in higher levels of congestion and this set of nodes is isolated from the rest of the market. In Section 5.3 we discuss how this may be an artifact of missing supply and how the structural model may be used to uncover this unobservable supply. 18 ID All locations and participants corr-coef std-reg-err avg-abs-err Outliers removed mw avg-abs-err corr-coef std-reg-err avg-abs-err mw avg-abs-err IS-1-0 IS-1-1 OS-1-0 OS-1-1 WOS-1-0 WOS-1-1 WOS-1-2 WOS-1-3 0.999 0.999 0.968 0.717 0.993 0.342 0.973 0.060 0.066 1.957 14.868 0.437 27.040 1.621 2.270 2.756 7.036 3.464 6.058 9.795 4.172 10.811 11.414 16.217 23.258 19.450 19.041 19.724 0.999 0.999 0.967 0.739 0.993 0.387 0.978 0.046 0.051 1.827 12.764 0.394 24.698 1.311 2.192 2.679 6.875 3.198 5.976 9.458 4.070 6.436 7.015 9.437 13.435 12.894 12.251 12.978 0.999 0.999 0.999 0.994 0.717 0.993 0.342 0.973 0.082 0.060 0.066 0.373 14.868 0.437 27.040 1.621 0.423 2.270 2.756 0.615 3.464 6.058 9.795 4.172 8.072 10.811 11.414 11.069 23.258 19.450 19.041 19.724 0.999 0.999 0.999 0.995 0.739 0.993 0.387 0.978 0.072 0.046 0.051 0.281 12.764 0.394 24.698 1.311 0.388 2.192 2.679 0.565 3.198 5.976 9.458 4.070 3.423 6.436 7.015 5.350 13.435 12.894 12.251 12.978 0.999 0.998 0.965 0.997 0.990 0.802 0.947 0.441 0.964 0.045 0.091 1.457 0.138 0.405 7.622 2.201 17.190 1.514 5.837 9.847 8.650 0.785 6.503 9.332 17.981 4.129 16.038 2.024 4.260 8.680 7.106 7.254 11.865 35.288 17.009 31.741 0.999 0.998 0.963 0.998 0.990 0.828 0.953 0.514 0.970 0.045 0.086 1.461 0.070 0.404 6.516 1.880 15.869 1.251 5.796 9.796 8.595 0.715 6.433 9.170 17.837 3.898 15.876 0.828 1.981 4.620 3.952 3.837 5.901 22.146 9.188 18.925 IS-2-0 IS-2-1 IS-2-2 OS-2-0 OS-2-1 WOS-2-0 WOS-2-1 WOS-2-2 WOS-2-3 IS-3-0 IS-3-1 IS-3-2 IS-3-3 OS-3-0 OS-3-1 WOS-3-0 WOS-3-1 WOS-3-2 WOS-3-3 Table 5: Benchmark 2 UC: uses rst IS sample prices and allocations for each test (a) In Sample (b) Out of Sample Figure 3: Predicted vs. True LMPs, predictions using Test 1 with one active constraint 19 (a) In Sample (b) Out of Sample Figure 4: Predicted vs. True LMPs, predictions using Test 2 with two active constraint (a) Constraint 680, x-axis:IS-1-0,IS-1-1 y-axis:OS-1-0,OS-1-1 (b) Constraint 680, x-axis:IS-2-0,IS-2-1,IS-2-2 y-axis:IS-2-0,OS-2-1,OS-2-2 (c) Constraint 1473, x-axis:IS-2-0,IS-2-1,IS-2-2 y-axis:IS-2-0,OS-2-1,OS-2-2 Figure 5: Recovered D coecients from dierent sample sets (a) In Sample (b) Out of Sample Figure 6: Predicted vs. True LMPs, predictions using Test 3 with three active constraint 20 The results indicate that the recovered market structure is quite accurate: the prices are very highly correlated with the true prices and removing the constraints reduces quality except where they should not be included (the well-out-of-sample tests). However, we are not able to precisely predict prices which tend to be shifted either as a whole or in bunches of similar locations. Missing supply and demand was mentioned above. MISO allows participants to also participate in both the ancillary services market and energy markets. A subset of participants also partaking in selling regulatory volumes may also be a cause of error (Ma et al. [2009b]). 5 Applications and Extensions 5.1 Application to Competitive Markets In this section we show that a competitive market where variation in prices is explained through competition for a set of resources with xed capacities is amenable to the methodology developed for electricity markets. We present sucient assumptions for an equilibrium in such a market to satisfy the twin market operations and pricing requirements, i.e., any equilibrium results in an ecient allocation with prices corresponding to the dual solution. The IO recovery algorithm can then be directly applied. In some likely applications the frictions driving price dierences may also be due to a capacitated transportation resource and the relative location of producers and consumers. For instance, pipeline capacity may explain large locational price dierences in natural gas markets. The central assumption is that empirically observed commodity prices correspond to optimal dual values of the production constraints in the optimal solution, i.e., standard necessary conditions for optimality provide a set of linear equations where variables are the resource utilization parameters. This pricing assumption may be justiable in decentralized markets which are in competitive equilibrium. The relationship between dual prices in mathematical programming models and competitive equilibria have long been well known (see, e.g., Samuelson [1952] and Takayama and Judge [1964]). The competitive market model augments the electricity market model described in Section 2.1 by also considering the set of resource holders. We assume that each resource is held by several arbitrarily small agents each with marginal cost of zero. Then, consider the set of prices (−π, −ρ) where a producer receives revenue −πl(i) xi and resource holders in total produce θj of resource j and receive revenue −ρj θj where P θj = i Dji bi . For simplicity we assume that the market is lossless. The notion of a competitive equilibrium identied here requires (1) that agents are prot maximizing given prices and production constraints, (2) the market allocation is feasible, and (3) no arbitrage opportunities 21 exist. We implement (3) by asserting that no agents can protably purchase the commodity at location a, pay the resource costs for that location, and receive the price for the reference location. The following theorem shows that these conditions are sucient to ensure that the identifying equations are sound and that the IO recovery algorithm is applicable. Theorem 4. Consider prices (−π, −ρ) and production (x, θ). If this system is in a competitive equilibrium, then Equations 7-17 hold. Proof. φ may be interpreted as the marginal production cost of the lowest cost unconstrained production. α and γ may be interpreted as the dierence between the received price and the marginal production cost for each participant. CE⇒Primal Feasibility (Equations 9-13): Requirement (2) is a statement of primal feasibility (Equations 9-13). CE⇒Dual Feasibility (Equation 14) : ρ ≤ 0 is a statement of non-negative resource prices which is consistent with our statement that the marginal resource production cost is zero for all agents. CE⇒Complementary Slackness (Equations 15-17): Equation 15 is equivalent to ρ = 0 ⇐ Db < d and ρ < 0 ⇒ Db = d; these must hold since any price greater than zero would allow a resource holder to protably increase production if there is surplus capacity. Equation 16 is equivalent to γ = 0 ⇐ x > ` and γ < 0 ⇒ x = `; with Equation 8 negative γ implies pricing below the marginal cost in which case revenue is maximized by producing the minimum volume `. Equation 17 is equivalent to α = 0 ⇐ x < u and α < 0 ⇒ x = u; with Equation 8 negative α implies pricing above the marginal cost in which case revenue is maximized by producing the maximum volume u. CE⇒Stationarity (Equations 7 and 8): Equation 7 corresponds to eliminating the arbitrage opportunity. Equation 8 holds by assumption. 5.2 Transmission Constrained Market Power Variation The transmission constraint parameters are useful for measuring the distribution of market power through the market footprint. As noted in [Borenstein and Bushnell, 1999] and [Karthikeyan et al., 2013] and others, market power can vary substantially in an electricity and this variation can be both over the footprint and over time. This spatial and temporal variation depends on the impact of a particular participant on active constraints and the ability of particular participants to make up for production shortfalls which is dened by the loss parameters. To capture this variation, Lee et al. [2010] and Lee et al. [2011] argue that useful market power measures can be constructed by using the residual demand derivative for each participant in the market. The residual demand derivative denes the demand surplus that a particular participant would 22 (a) Active Constraint 680 (b) Active Constraints 680, 1473 (c) Active Constraints 100,1526,1530 Figure 7: Distribution of Residual Demand Derivative for Samples with Diering Active Constraints observe for a marginal increase in price. In the transmission constrained electricity market this depends both on the transmission constraint contribution factors D and the loss parameters δ .The residual demand derivative is a precise reection of the current market state and without some form of aggregation, does not necessarily reect the a priori beliefs of participants. These beliefs may provide a more accurate explanation for the structure of a particular bid. For instance, only the realized set, and no highly likely alternative sets, of active constraints are taken into account. Xu and Baldick [2007] show that in a lossless locationally priced electricity market, this realized residual demand derivative can be eciently calculated for each location given bids and the constraint factors of active constraints. To adapt this analyses to our model requires only a minor modication.In short, replacing the lossless energy balance equation with Constraint 3 and carrying through the same derivation as in Xu and Baldick [2007], the residual demand derivative for a particular participant i located at the reference bus becomes: RDD(i) = −δ̄iT Λi δ̄i + δ̄iT Λi D̄i (D̄iT Λi D̄i )−1 D̄iT Λi δ̄i where, δ̄i and D̄i are respectively the participant loss vector and the participant active constraint contribution matrix and relevant entries for i removed (i.e. δ̄i is Q − 1 dimension vector D̄i is a Q − 1 × Q − 1 dimension matrix). With b(j) denoting the bus at which participant j is located and for convenience allowing i = Q, (δ̄i )j = δb(j) and (D̄i )rj = Drb(j) .Λi is a diagonal matrix with entries corresponding to Sj00 (xj ) of each participant except i (i.e. Λi is a Q − 1 × Q − 1 dimension matrix). To calculate the residual demand derivative for a participant i not at the reference bus, δ and D can easily be refactored so that b(i) is the reference bus by dividing δ by δb(i) and zeroing the b(i)th row of D using δ . We have implemented this calculation for each of the market samples considered in Section 4. Histograms showing the distribution of residual demand derivatives for each participant are shown in Figure 7. This gure shows an outcome for each of the three test sets with one, two or three active constraints. Panels (a) 23 (a) Coecients of Constraint 100 (b) Coecients of Constraint 1526 (c) Coecients of Constraint 1530 Figure 8: Relationship between residual demand derivative and active constraint coecients for example IS-3-0 and (b) of Figure 7 show very similar ranges of market power though the less constrained example in Panel (a) appears to show a slightly more uniform distribution of residual demand derivatives. Panel (c) on the other hand is quite distinct. While the bulk of the participants face similar residual demand derivatives in the -20 to 0 range, there is a long left tail where participants face substantially higher magnitude residual demand derivatives. Figure 8 shows the relationship between the residual demand derivative and constraint coecients. There is little visible correlation in panels (a) and (b) between the residual demand derivative and coecients of constraints 100 and 1526. On the other hand Figure 8(c) shows that the residual demand derivative is highly correlated with coecients for constraint 1530. 5.3 Analytical Applications In this paper we have derived and demonstrated a novel structure recovery methodology. Structural approaches provide an advantage over purely statistical approaches by allowing analysis of hidden parameters and counterfactual analysis. Recovery of parameters of capacity lines, D and d, is an example of a hidden parameter not accessible through standard statistical methods. Relevant examples that have not yet been explored include uncovering unobservable supply and demand, estimating market power, and estimating the value of capacity investment. We will demonstrate a naive version of uncovering unobservable supply and demand and discuss implementation of the other analyses. While bids and outcomes are publicly reported for participants in most electricity markets, there are supply and demand sources in electricity markets that are not reported. These may include imports and exports and grandfathered bilateral transactions. In practice, these injections and withdrawals will not be decision variables in the optimal ow program but they nonetheless aect transmission constraints and thereby indirectly impact prices. As a result, when new prices are generated without this missing data, the lack of supply and demand in certain regions may bias prices. This type of error would appear as a systemic 24 (a) Original predicted LMPs (b) Predicted LMPs with 61MW at Location 1811 Figure 9: Predicted vs. True LMPs for Test 3 with and without absolute error minimizing additional supply bias in the price for a set of locations with similar coecients for one or more transmission constraints, which is consistent with the type of error observed in our predictions of MISO prices. This suggests a simple moment matching algorithm where supply is added at overpriced nodes and demand is increased at underpriced nodes until an error measure such as the absolute pricing error of the LMPs is minimized. Since Test 3 from Section 4 very clearly shows this type of bias we use this as a case study. The relation between predicted and actual MW for Test 3 in-sample IS-3-0 is depicted in Figure 9 Panel (a). A collection of outliers appears in the top right hand corner of this gure which are severely overpriced by the recovered model resulting in an absolute pricing error of $16.285. These outliers correspond to a set of pricing nodes in northern Wisconsin and Michigan. We have aggregate information on day ahead import plan and in this hour, 1511 MWh were planned for import from the IESO market (the Ontario energy market) and 1600 MWh from the PJM market. It is plausible that a substantial amount of energy may have moved into the MISO market from Ontario resulting in this descrepancy. Under this assumption, we applied a naive version of the moment matching technique by adding a supply of uncharged MWs of energy at location 1811 (MISO pricing node UPPC.AZ) which is one of the nodes in the cluster of outliers. The absolute error is reduced to $14.74 when 61MW are added. The resulting prices from the absolute error minimizer are shown in Figure 9. Recovered market structure allows estimates of the value of a capacity investment to be made which take into account market power. If there is substantial demand elasticity at the location, revenue may be much less than if the new capacity was sold at the previously observed prices. The revenue needs to take into account the price eect of the injection of capacity at that particular point in the network. Using the recovered model, the eect of the new capacity on the local price can be recovered by rerunning the allocation program with the new bids. More sophisticated analyses may consider strategic factors and also modify the 25 IS-3-0 15.6 15.5 15.4 Avg Absolute Error 15.3 15.2 15.1 15 14.9 14.8 14.7 14.6 0 20 40 60 80 100 120 MW at location 1811 Figure 10: Absolute error in price predictions as supply is increased at Location 1811 bidding strategies of other agents to ensure they remain rational. 6 Conclusion This paper develops an inverse optimization based method for recovering market structure from allocation and pricing data. The methodology takes advantage of a long history of equating market eciency and optimization. It is justied when the allocation and locational prices correspond to the primal and dual of the optimal assignment problem. The IO recovery algorithm implements the methodology in a context specialized for electricity markets where prices are determined by competition for transmission resources and line loss. The IO recovery algorithm recovers utilizations of transmission resources from the locational prices alone and with allocation information the capacities of these resources are also determined. In the context of a case study of the MISO electricity market, we studied the application of this methodology. Our ndings show that the recovered market structure does a good but imperfect job at reconstructing market prices. We hypothesize that the imperfections are primarily due to missing allocation data. In extensions, we discuss naive methods to estimate missing supply and demand for regions in the market. Markets where transportation links between regions are capacitated are challenging for managers because market conditions can shift dramatically depending on which constraints are active. The methodology put forward in this paper can help managers determine their bidding and operational strategies by using structural models to make market inferences and perform counterfactuals. Similarly, this may provide a mechanism for regulators to make inferences about markets where they cannot observe all transactions but have some knowledge of prices and allocation volumes. In the applications and extensions we have discussed estimation of missing missing market data, estimation of market power and analysis of investment decisions as particularly worthy of further study. With further renement of these and related techniques, structure recovery methodologies based on inverse optimization may be a useful analytical tool for participants in 26 institutionalized markets like electricity markets as well as decentralized but ecient markets exhibiting locational price variation. References Ravindra K Ahuja and James B Orlin. Inverse Optimization. Operations Research, 49(5):771783, 2001. Severin Borenstein and James Bushnell. Market Power in Electricity Markets: Beyond Concentration Measures. Energy Journal, 20(4):6588, 1999. Christian Broda and David E Weinstein. Understanding International Price Dierences Using Barcode Data Christian Broda. NBER Working Paper, (14017), 2008. Judith B Cardell, Carrie Cullen Hitt, and William W Hogan. Market power and strategic interaction in electricity networks. Resource and Energy Economics, 19(1):109137, 1997. A Kerem Cosar, Paul L E Grieco, and Felix Tintelnot. Borders , Geography , and Oligopoly: Evidence from the Wind Turbine Industry. 2012. Helmuth Cremer, Farid Gasmi, and Jean-Jacques Laont. Access to Pipelines in Competitive Gas Markets. Journal of Regulatory Economics, 24(1):533, 2003. Charles Engel and John H. Rogers. How wide is the border? The American Economic Review, 86(5):pp. 11121125, 1996. William W. Hogan, E. Grant Read, and Brendan J. Ring. Using Mathematical Programming for Electricity Spot Pricing. pages 209221, 1996. Garud Iyengar and Wanmo Kang. Inverse conic programming with applications. Operations Research Letters, 33(3):319330, May 2005. ISSN 01676377. doi: 10.1016/j.orl.2004.04.007. SP Karthikeyan, IJ Raglend, and DP Kothari. A review on market power in deregulated electricity market. International Journal of Electrical Power & Energy Systems, 48:139147, June 2013. Yen-yu Lee, Jin Hur, Ross Baldick, and Salvador Pineda. New Indices of Market Power. IEEE Transactions on Power Systems, 26(681-689):19, 2010. YY Lee, Ross Baldick, and Jin Hur. Firm-based measurements of market power in transmission-constrained electricity markets. Power Systems, IEEE Transactions, 26(4):19621970, 2011. 27 Xingwang Ma, Haili Song, Mingguo Hong, Jie Wan, Yonghong Chen, and Eugene Zak. The securityconstrained commitment and dispatch for midwest iso day-ahead co-optimized energy and ancillary service market. In Power & Energy Society General Meeting, 2009. PES'09. IEEE, pages 18. IEEE, 2009a. Xingwang Ma, Haili Song, Mingguo Hong, Jie Wan, Yonghong Chen, and Eugene Zak. The securityconstrained commitment and dispatch for midwest iso day-ahead co-optimized energy and ancillary service market. In Power & Energy Society General Meeting, 2009. PES'09. IEEE, pages 18. IEEE, 2009b. John Mccallum. National Borders Matter : Canada-U . S . Regional Trade Patterns. American Economic Review, 85(3):615623, 1995. M. Mudrageda and F. H. Murphy. An Economic Equilibrium Model of the Market for Marine Transportation Services in Petroleum Products. Operations Research, 56(2):278285, March 2008. Jorge Nocedal and Stephen J. Wright. Numerical optimization. 2nd edition, 1999. ISBN 9780387303031. Mandhir Sahni, Richard Jones, and Yunzhi Cheng. Beyond the Crystal Ball. IEEE Power & Energy Magazine, 10(4):3542, 2012. Paul A Samuelson. Spatial Price Equilibrium and Linear Programming. American Economic Review, 42(3): 283303, 1952. Andrew J. Schaefer. Inverse integer programming. Optimization Letters, 3(4):483489, June 2009. ISSN 1862-4472. doi: 10.1007/s11590-009-0131-z. B. Stott, J. Jardim, and O. Alsac. DC Power Flow Revisited. IEEE Transactions on Power Systems, 24(3): 12901300, August 2009. T Takayama and G Judge. Spatial Equilibrium and Quadratic Programming. Journal of Farm Economics, 46(1):6793, 1964. Lizhi Wang. Cutting plane algorithms for the inverse mixed integer linear programming problem. Operations Research Letters, 37(2):114116, March 2009. ISSN 01676377. doi: 10.1016/j.orl.2008.12.001. Frank A Wolak. Measuring the benets of greater spatial granularity in short-term pricing in wholesale electricity markets. American Economic Review, 101(3):247, 2011. Allen J Wood and Bruce F Wollenberg. Power generation, operation, and control. Wiley-Interscience, 2012. Lin Xu and Ross Baldick. Transmission-constrained residual demand derivative in electricity markets. Power Systems, IEEE Transactions on, 22(4):15631573, 2007. 28 Xiaoguang Yang and Jianzhong Zhang. Partial inverse assignment problems under norm. Operations Research Letters, 35(1):2328, January 2007. ISSN 01676377. doi: 10.1016/j.orl.2005.12.003. Jianzhong Zhang and Zhenhong Liu. Calculating some inverse linear programming problems. Journal of Computational and Applied Mathematics, 72:261273, 1996. Jianzhong Zhang and Zhenhong Liu. A further study on inverse linear programming problems. Journal of Computational and Applied Mathematics, 106:345359, 1999. A Electricity distribution as a linear allocation problem The standard assumptions underlying the DC approximation to AC power ow are (1) an assumption of no line loss, (2) bus voltage magnitudes are unit, and (3) voltage angle dierences across branches are small (the implications of these assumptions are discussed in Stott et al. [2009]). The standard formulation for the power ows (c.f. Wood and Wollenberg [2012]) uses voltage angles θi for each node i: X bi = 0, (33) Bθ = b, (34) θr = 0, (35) y(i,j) = 1 (θi − θj ), z(i,j) ∀(i, j) ∈ L. (36) Node r is the reference bus which is selected arbitrarily and set to zero so that the problem is well dened. −1 B is determined by per unit line reactances z(i,j) where for i 6= j , Bij = z(i,j) if branch (i, j) exists and P 1 zero otherwise and Bii = (g,h)∈Ei z(g,h) where Ei is the set of links involving bus i. To put this system of equations into a more useful form for our analysis, we dene the matrix Bf as an |L| × |N | matrix where row k corresponds to edge (i, j), (Bf )ki = 1/zk , (Bf )kj = −1/zk , and for h ∈ / {i, j}, (Bf )kh = 0 (zk is the line reactance of line k ). Then letting matrix D = Bf B , y = Db, and the constraints on production at the buses may be succinctly represented as: X bi = 0, (37) |Db| ≤ d, (38) where the second constraint is from the capacity d(i,j) of link (i, j). 29 In keeping with the MISO's optimization model (described in Ma et al. [2009a]), we consider losses approximated by linear functions of the injections into the network. The loss model, L-LIN, considers loss as a linear function of net production and consumption at each node. Positive loss parameters δi+ and δi− are assigned to each node i : X bi + X δi− b+ i + X δi− b− i = 0, (39) (40) |Db| ≤ d. There are more principled ways of incorporating loss which would also adjust matrices B and Bf to accurately account for ow changes from loss. However, since the goal is to infer the product of these matrices from actual data, these details are not too concerning. A.1 Flows in Numerical Example This is a numerical example of the construction of the DC ows from a description of the network. We assume no loss in this example. We have four nodes {1..4} connected into a single cycle by links L = {1 − 2, 2 − 3, 3 − 4, 1 − 4} (let these links have respective indices 1..4 and be referred to as ek ). Flow may go in either direction on these links (y may be negative) and have reactance z = 1 except for the link 1-4 where z = 2. Participant 1 Min:0 MW Max:500 MW Cost:$20/MWh 1 z=1 2 Limit: 50MW z=2 Participant 4 Min:-300 MW Max:-300 MW Participant 2 Min:0 MW Max:0 MW z=1 4 z=1 3 Connectivity may be described in the following adjacency matrix: 30 Participant 3 Min:0 MW Max:500 MW Cost:$30/MWh 1 1 0 1 1 0 1 1 1 1 0 1 1 0 . 1 1 This connectivity is reected in the admittance matrix: 1.5 −1 B= 0 −0.5 −1 −0.5 0 2 −1 −1 2 0 −1 0 . −1 1.5 Selecting bus 4 as the reference bus, we set θr = θ4 = 0. Then we set B−r as B less the row and column −1 x−r . corresponding to the reference bus. Then B−r θ−r = x−r . B−r may be inverted such that θ−r = B−r −1 Then dene Bθ as B−r with an additional row and column of zeros inserted at the position corresponding to the reference bus. In our example these matrices are: −1 B−r 1.2 0.8 0.4 0 1.2 0.8 0.4 0.8 1.2 0.6 0 and Bθ = = 0.8 1.2 0.6 0.4 0.6 0.8 0 0.4 0.6 0.8 0 0 0 0 . Finally, the angles may be concisely represented as θ = Bθ x. Now to get the ow on each link, consider connectivity matrix E as described in earlier documents, i.e., Eki = 1 if ek = (i, j) for some j , Ekj = −1 if ek = (i, j) and otherwise Eki = 0. Bf is simply matrix E row by row multiplied by 1/zk such that Bf θ = y as in Equation 36. Then, 1 0 E= 0 1 −1 0 1 −1 0 1 0 0 0 1 −1 0 0 and Bf = −1 0 −1 0.5 The ow over links can then be expressed as y = Bf θ = Bf Bθ x and 31 0 1 −1 0 1 0 0 0 0 . −1 −0.5 0.4 0.4 Bf Bθ = 0.4 0.6 −0.4 −0.2 0.6 −0.2 0.6 0.8 0.4 0.2 B B f θ Then D = . −Bf Bθ 32 0 0 . 0 0