Journal of Energy Markets
ISSN:
17563607 (print)
17563615 (online)
Editorinchief: Derek W. Bunn
Estimating marginal effects of key factors that influence wholesale electricity demand and price distributions in Texas via quantile variable selection methods
Abstract
Understanding the key drivers of prices and energy consumption is an important issue, which is complicated because the distributions of prices and consumption are asymmetric and fattailed. That is, the sets of relevant covariates can vary depending on the segment of interest in the conditional distributions of price and demand. Using a large data set from the Electric Reliability Council of Texas, this study uses quantile regressions and attendant variable selection methods to choose the most important factors that influence demand and price distributions; subsequently, the marginal effects of these factors are studied. Among the many findings, two critical ones are that the marginal effects of the covariates change throughout the distributions of demand and price, and that the number of relevant variables selected using mean regressions generally exceeds the number selected using quantile regressions. Related consequences for maintaining a reliable electricity market are discussed.
Introduction
Abstract
Understanding the key drivers of prices and energy consumption is an important issue, which is complicated because the distributions of prices and consumption are asymmetric and fattailed. That is, the sets of relevant covariates can vary depending on the segment of interest in the conditional distributions of price and demand. Using a large data set from the Electric Reliability Council of Texas, this study uses quantile regressions and attendant variable selection methods to choose the most important factors that influence demand and price distributions; subsequently, the marginal effects of these factors are studied. Among the many findings, two critical ones are that the marginal effects of the covariates change throughout the distributions of demand and price, and that the number of relevant variables selected using mean regressions generally exceeds the number selected using quantile regressions. Related consequences for maintaining a reliable electricity market are discussed.
1 Introduction
1.1 Research motivation
Electricity tends to have the greatest price volatility of any commodity traded in a wholesale market. Since electricity cannot yet be economically stored in large quantities, prices in organized wholesale electricity markets change as system operators strive to match supply and demand by dispatching resources with varying marginal costs in real time to maintain reliability. The cost of generating and transmitting electricity fluctuates due to unanticipated changes in enduser demand, the availability of generating units to provide supply, transmission bottlenecks and many other factors. This volatility can impose significant costs and risks upon the buyers and sellers of electricity, and a variety of physical and financial hedging strategies have been developed to manage these risks. The variability in demand that contributes to price fluctuations may be the result of changes in the weather, changes in production levels at industrial facilities, a response to electricity prices, demand–response actions by loadserving entities or other factors. In addition to contributing to price volatility, demand fluctuations impose costs on an electricity market by necessitating an infrastructure sized to handle peak loads, investments in peak generation capacity and operating reserves.
A typical approach to modeling demand or prices in an electricity market is to formulate a simple linear or loglinear regression to explain the fifteenminute or hourly prices or demand using a set of explanatory variables. To explain dayahead market prices, such variables might include the projected level of demand, the price of the fuel associated with the generation source likely to be onthemargin (eg, natural gas) and the expected levels of generation from baseload generation sources (eg, wind energy, solar energy and nuclear power plants). To explain realtime electricity prices, similar explanatory variables might be considered. Alternatively, one might apply the dayahead market price along with errors in the projections used to explain the dayahead prices. To explain electricity demand, weather and temporal variables (eg, timeofday, monthofyear, residentialapplianceuse patterns) are often applied; however, the demand side of electricity markets is becoming increasingly sensitive to sharp price fluctuations.
A limitation of simple regression models is their inability to recognize that various explanatory variables affect different parts of the distributions of prices and demand differently. They might provide acceptable explanatory power “on average” but may be poor at modeling the entire distributions of price and demand. A change in the generation level from a baseload power plant may have little impact on prices if prices are low or moderate, but it could result in a large jump or decline in prices if prices are already high. An additional coolingdegree hour may have little impact on demand if demand is low, but could have a large impact on demand during a heat wave. A change in price from USD20/MWh to USD30/MWh is unlikely to elicit any demand response, but a spike in prices to USD2000/MWh will elicit a response from the demand side of an electricity market. Thus, the relationships may, in fact, be highly nonlinear (as exemplified by a merit order curve) and different variables may have varying impacts on different levels of prices or demand.^{1}^{1} 1 The bid stacks and merit order curves may change every five minutes in the Electric Reliability Council of Texas (ERCOT) market. The US Energy Information Administration website provides a “generic” merit order curve displaying the nonlinear relationship between demand and prices (http://www.eia.gov/todayinenergy/detail.php?id=7590). Though this curve is not ERCOT specific, and is based on the marginal variable cost of resources (rather than offers), it nonetheless exemplifies the nonlinearity in the data.
1.2 Methodology overview
We shall model energy consumption and prices for each of the major zones (also called regions) in Texas: Houston, North, South and West. For the moment, without loss of generality, consider the following standard conditional mean regression equation for demand $y$ for the $i$th region, where for illustration we assume just one independent variable, say the dayahead market (DAM) price, denoted by $x$:
$$y=\alpha +\beta x+e.$$  (1.1) 
The error, $e$, is normally distributed with mean $0$ and unknown standard deviation $\sigma $. The slope, $\beta $, is the sensitivity of demand to DAM price. To motivate the need for quantile regressions, consider the following limitations of (1.1).
1.3 Location shift model
The slope measures the impact of DAM price on the conditional mean of returns. Hence, under this assumption, the marginal effect of price is just a “location shift”, ie, the impact at the mean value of demand will be the same as the impact for the entire distribution of demand (Heckman et al 1997), and it has no effect on the scale or shape of the demand distribution. In the empirical analysis, it will be shown that this assumption is implausible for the demand data used in this paper. It is also implausible if we replace the lefthandside variable $y$ by price, ie, conditional mean regression models for both price and consumption of electricity in Texas are inadequate.
1.4 Complex features of demand distributions
Earlier, we noted energy economics reasons why covariates are likely to affect demand and price differently throughout their respective distributions. Thus, in our simple example above, it is possible that DAM prices (and other covariates) may influence the demand distribution in different ways. In addition, the statistical distribution of demand (and price) is leptokurtic (fattailed), skewed and influenced by outliers.
1.5 Varying relationships between demand and covariates
A static and linear relationship between demand and covariates such as DAM price is typically assumed in (1.1). However, such an assumption can be inconsistent with the entire demand and/or price distribution. In this regard, it is worth recalling a useful insight from Mosteller and Tukey (1977): “What the regression curve does is give a grand summary for the averages of the distributions corresponding to the set of $X$’s. We could go further and compute several different regression curves corresponding to the various percentage points of the distributions and thus get a more complete picture of the set. Ordinarily this is not done, and so regression often gives a rather incomplete picture. Just as the mean gives an incomplete picture of a single distribution, so the regression curve gives a corresponding incomplete picture for a set of distributions.” In addition, Koenker and Hallock (2001) state that “quantile regression is fast becoming a comprehensive strategy for completing the regression picture”. Koenker and Hallock studied income distributions conditioned on factors such as age, sex, years of education and poverty level. Each additional year of education will have a large effect in lowerincome groups, but little or no effect in upperincome groups. This type of nonlinear responses, based on various settings of the independent variables, leads to better explanatory and predictive inferences if we model the distribution of the response variable via quantile regressions.
1.6 Variable selection
Unlike the above mean regression example, in reality, multiple factors influence demand. Likewise, a model for price would also include many covariates. Variable selection is a difficult problem. If there are $p$ independent variables, then there are ${2}^{p}$ possible models, ie, each unique linear combination of the variables constitutes a model. Clearly, even for a modest $p$ it would be impossible to sift through all possible models to arrive at the “best” model. Note that “best” does not mean the “true” model; the latter is a theoretical artifact that is seldom, if ever, attained in practice. Also, “best” here does not mean different functional forms involving the covariates. However, we can still try to identify the “best” possible subset of variables that are most relevant to model the dependent variable of interest. Several criteria – such as Akaike, Bayesian and deviance information criteria, and forward selection – have been proposed to define such a “best” subset. Model selection in quantile regression has two interesting features. First, in the special case of the median regression it can be seen as a way of achieving robustness in variable selection. Second, in many economics data sets (including the one in this study) heterogeneity exists due to either heteroscedastic variance or covariate effects that are influenced not just by the location of the data distribution but also by higherorder moments such as skew and kurtosis. Thus, the sets of relevant covariates can vary depending on where we are on the conditional distribution of the dependent variable given covariate information. At the outset, we note that in this paper our focus is on insample variable selection.
One of the most robust methods to handle variable selection in quantile regressions is the wellknown Bayesian information criterion (BIC) (see Schwarz 1978). In the popular linear mean regression model, the BIC has been successful because it is known that the best subset selection with the BIC identifies the true model consistently (Nishii 1984). Modified forms of the BIC within a quantile regression framework that account for large model spaces and large sample sizes are known to have sound statistical properties (see Machado 1993; Wu and Liu 2009; Wang et al 2012). Many of these improvements to the BIC are predicated on developing robust shrinkage estimation methods, such as the least absolute shrinkage and selection operator (Lasso) and smoothly clipped absolute deviation (SCAD) (see Wang et al 2007a, b, 2012; Lee et al 2014).
In the energy literature, several researchers have contributed to the use of quantile regressions (see, for example, Bessa et al (2012), He et al (2016), Maciejowska et al (2016), Hagfors et al (2016a, b), Cabrera and Schulz (2017), Li et al (2017), Lebotsa et al (2018), Taylor (2019), and the many additional references therein). Most of these studies consider forecasting electricity demand or prices; in the United Kingdom, for example, Hagfors et al (2016a) examine marginal effects for electricity price data but not for demand. None of the studies attempts variable selection at every quantile using BIC. Our goal is not forecasting; it is, first, to understand the impact of key covariates on the distributions of both dayahead and realtime prices as well as electricity demand in Texas, and second, to employ variable selection methods to the various quantiles of these distributions. Identifying the most important variables would help practitioners to better understand the changing marginal effects of relevant – and differing – variables across these distributions. For instance, it is possible that the effect of natural gas prices on DAM prices is significantly different than its effect on realtime market (RTM) prices; moreover, within the DAM price distribution, natural gas prices could have differing impacts. Consider another reason why certain variables may be more relevant than others: at times, binding transmission constraints among the different zones in Texas lead to a divergence in zonal prices, and the generation resources within each zone then play a greater role in determining the prices within that zone. For example, nondispatchable wind generation that is concentrated in West Texas will occasionally set the marketclearing price in the West zone, when transmission limits prevent the export of wind generation to neighboring zones. Finally, we look at whether we should use mean or median regressions to model demand and price of electricity.
1.7 Paper findings
We explore the impact of key covariates on the distributions using hourly data for the years 2015–17 from the Electric Reliability Council of Texas (ERCOT) market. Divided into four major zones – Houston, North, South and West – ERCOT serves 85% of the electrical needs of the state with the largest electricity consumption in the United States; it accounts for about 8% of the nation’s total electricity generation, and is repeatedly cited as North America’s most successful attempt to introduce competition into both the generation and retail segments of the power industry (Treadway 2015).^{2}^{2} 2 Strictly speaking, there are eight ERCOT zones, wherein South encompasses three smaller subzones (Austin, LCRA, San Antonio) and North includes a tiny area (Rayburn). From a modeling perspective, there is no loss in generality by focusing on the four major zones, which account for close to 85% of ERCOT’s load. The main findings are the following. First, we show that median (more generally, quantile) rather than mean regressions are better suited to model energy consumption and price distributions in ERCOT. Second, for each of the four zones, all the key marginal effects vary throughout the distributions of demand and price. Third, under both the demand and pricing models, a larger set of variables generally tends to be selected using a mean regression than is obtained using a median regression. Thus, even from a parsimony perspective, median regressions are better. Fourth, for the demand models in the four zones, the same set of independent variables are deemed relevant by the BIC procedure. This is a robust finding. Fifth, for the four DAM price models, almost the same set of variables are relevant, providing yet another robust conclusion regarding the “best” model. Finally, for the four RTM pricing models, both the mean and median regressions yield the same set of relevant independent variables; that is, the “best” models coincide. However, the median regression maximum likelihood estimation (MLE) values are less biased than the corresponding ordinary least squares (OLS) estimates for these independent variables.
1.8 Overview of the paper
Section 2 describes the data and variables used in the study. The quantile regression model and the variable selection process are detailed in Section 3. Section 4 provides a comprehensive empirical analysis for the Houston zone; the results are similar for the remaining three zones in Texas. A brief discussion is given in Section 5.
2 Data and variables
This section describes the data used in the analytic models, including geographical scope and sample period. All data is publicly available. ERCOT regularly posts data pertaining to prices, demand and generation by fuel type on its website (http://www.ercot.com). Additional archived historical data was obtained from ERCOT through data requests. The Henry Hub natural gas price data was obtained from the US Energy Information Administration’s website.
2.1 Geographical scope and graphical insights
See Figure 1 for a map of ERCOT. The North and Houston zones account for about 37% and 27%, respectively, of ERCOT market energy sales, while the South and West zones contribute 12% and 9%. Further, these four zones account for nearly all of the state’s retail competition, and most of the competitive generation resides within these zones. The input mix to electricity production in Texas is shown in Figure 2. In the sample period considered, Texas witnessed a rise in wind generation and a decline in coal generation. This is consistent with the state’s policy to reduce the negative impact of energy sources that could harm the environment. West Texas has the largest windgeneration operations in the state due to favorable weather conditions for that energy source.
Figure 3 shows the systemwide demand in MWh by hour, while Figure 4 shows the average energy consumption in MWh by month for the entire sample period. In Texas, these demands typically peak in the summer months, particularly in the afternoons. In Figure 3, the latter feature is evidenced by the rising demand during daylight hours till it reaches a maximum in the afternoon before dropping off. From Figure 4, it is clear that the average load is largest in the summer months.
Figures 5 and 6 provide the distribution of prices (as box plots) by hour and month, respectively. What is striking is that, while most prices are in the USD25 to USD30 range (25th to 75th percentiles), there are several instances of very high prices. This is one of the reasons why the quantile regression methodology and the associated variable selection methods detailed in this paper are useful. The variables that impact these price distributions at the extreme quantiles are likely to be different than those in the middle.
Overall, this study uses a very rich and large database to better understand the key variables that impact DAM and RTM prices and wholesale energy demand. Price and demand in the ERCOT market have been analyzed in a variety of prior studies using many of the same variables and data sources employed here, including Woo et al (2011, 2012), Zarnikau et al (2014, 2016, 2019a) and Tsai and Eryilmaz (2018). The sets of independent variables used to model demand and price in this paper are based on these prior studies.
Note that we do not model price and demand simultaneously in this paper. In a related study, Damien et al (2019) find that for some regions in Texas the simultaneous relationship does not hold. They also show that a different approach to modeling simultaneous relationships may be viable in general. To the best of our knowledge, there is no econometrics or statistics literature showing an easy way of modeling simultaneous quantile regression equations.
2.2 Sample period and variables
The sample period is from January 1, 2015 to December 31, 2017. In this time frame, the data were analyzed at the hourly load level; that is, for each hour in a twentyfourhour cycle, complete data on all the variables used in the analysis was employed, leading to a very large data set for each of the four zones. For each region, we consider three models (models 1, 2 and 3) in which wholesale demand (also called energy consumption or load in MWh), DAM price (USD/MWh) and RTM price (USD/MWh) are the dependent variables, respectively. Thus, twelve quantile regressions are executed; note that this also nets twelve median regressions, since we select the median (also called the 50th percentile) in our analysis as one of the quantiles. In addition, we also implement appropriate mean regressions, using OLS, to provide a comparative analysis.
Percentile  

Zone  Mean  SD  Skew  Kurtosis  5th  25th  50th  75th  95th 
Houston  10 923  2 481  0.86  0.14  7 698  9 157  10 314  12 278  15 993 
North  14 243  3 757  0.93  0.37  9 298  11 721  13 305  16 137  22 109 
South  5 309  1 246  0.71  $$0.12  3 635  4 386  5 052  6 060  7 772 
West  3 645  492  0.82  0.29  2 979  3 285  3 545  3 918  4 651 
Percentile  

Zone  Mean  SD  Skew  Kurtosis  5th  25th  50th  75th  95th 
Houston  26.45  27.09  37.5  2 460  12.47  18.24  22.28  28.61  51.04 
North  23.85  24.86  47.19  3 415  12.02  17.72  20.97  25.82  42.16 
South  26.68  26.74  38.91  2 600  12.18  18.5  22.73  29.45  50.82 
West  24.21  25.71  43.75  3 067  10.62  17.50  21.04  26.68  45.10 
Percentile  

Zone  Mean  SD  Skew  Kurtosis  5th  25th  50th  75th  95th 
Houston  25.93  46.85  19.84  550  12.16  17.57  20.15  24.77  39.91 
North  23.22  24.32  13.77  274  11.24  17.29  19.78  23.66  38.01 
South  25.88  37.14  12.84  237  11.20  17.39  20.07  24.63  46.02 
West  23.38  27.31  12.05  215  4.52  16.90  19.69  23.78  42.39 
Tables 1–3 contain the summary statistics, by zone, for load, DAM price and RTM price variables, respectively. It is evident that all three distributions are asymmetric. Given that the kurtosis for a normal distribution is zero, the kurtosis values for the price distributions suggest highly leptokurtic (heavytailed) shapes, while the load distribution has tails that are slightly heavier than the normal for three of the four zones (South has lighter tails than the normal distribution). Note also that the prices are very large, especially DAM prices. This further suggests that modeling the entire distribution of prices could give useful insights. Indeed, estimates of marginal effects from mean regressions are likely to be biased since the conditional expectations of the response variable in such models will be stretched in the direction of the asymmetry.
Model 1: wholesale energy consumption (or demand) equations
Consider the following nine variables for each of the four energy consumption models.
 Temperature (Fahrenheit):

this records the temperatures at a major city within each zone. Clearly, hot or cold temperatures increase electricity demand. The relationship between temperature and demand tends to be nonlinear; sometimes, depending on the level of demand, it could result in a spike. It is nonlinear because both very hot and very cold temperatures lead to an increase in demand. Temperature indirectly affects prices; extreme temperatures increase demand, which increases prices (see also Figures 3 and 4).
 Transmission price:

this cost is typically a response of loadserving entities and large industrial energy consumers, based on contributions to system peak demand in four summer months (also called four coincident peaks (4CP)). ERCOT’s staff analysis suggests demand could potentially fall by over 1000 MW during a 4CP period.^{3}^{3} 3 See “Analysis of load reductions associated with 4CP transmission charges and price responsive load/retail DR: Raish’s presentation to the ERCOT Demand Side Working Group”. URL: http://www.ercot.com/calendar/2017/3/24/115556DSWG. In reality, since transmission price is based on the four highest demand readings, it is not a DAM or RTM phenomenon; as such it cannot be calculated until the end of a summer. This makes it difficult to know which fifteenminute intervals to use in the calculation until each month is complete. Typically, we would expect the slope coefficient to be negative, as transmission prices are charged to large industrial energy consumers and loadserving entities during 4CPs. However, based on the preceding description, there is considerable uncertainty in this transmission price data and we can expect significant fluctuation in parameter estimates.
 Summer dummy:

June, July and August are coded as 1 in the binary representation of this monthly variable; other months are coded as 0 (see also Figure 4).
 Hour dummy:

this binary variable measures the impact of extra demand during the peak hours of 16:00 to 18:00 each day (see also Figure 3). Generally, this variable’s marginal effect would be positive. Household consumption of energy tends to increase during these hours, when people return home. But, depending on the zonal temperatures (especially in the West and North), we can expect instances where a negative effect might result.
 Lagged DAM price:

if DAM prices are high on a given day, this could lead to a reduction in demand the following day. A similar argument can be made for RTM prices.
 Lagged load:

concurrent days of high demand are encapsulated via a oneday lag; the marginal effect here is likely to be positive.
 Moving average of RTM price:

twohour moving averages of RTM prices are used in the DAM model for the reasons given in Section 1. In the short term, the rate of change due to this variable can be positive or negative, depending on the zonal and systemwide increase in demand requirements. Some industrial customers may cut back production even when RTM prices reach, say, USD300, whereas others might react only if prices spike to very high levels.
 Price dummy for spike at USD300:

some midsized industrial customers tend to scale back production if RTM prices exceed USD300. This binary variable captures the impact of such customers.
 Price dummy for spike at USD2000:

since we also include lagged load as an independent variable in the model, it would be prudent not to over fit by using too many dummy variables. However, the price distributions shown in the summary tables include some extremely large values. These occur so infrequently that it would be useful to study their marginal effect on energy consumption. The cutoff for this tail value is set at USD2000. We hope to capture the downward push on demand, if any, via this dummy variable.
Both of the price dummy variables above were selected after carefully exploring the graphical and tabular summaries. Our list of possible explanatory variables, including these dummy variables, is consistent with those used in the prior studies cited earlier.
Model 2: DAM price equations
Supply offers begin at 06:00 and end at 10:00 on the previous day for ERCOT’s dayahead price determination by 13:30. Prices for the current day are set around 18:00, after ERCOT completes the unit commitments necessary for grid reliability. As actual wind generation and total system demand on the current day are unknown on the previous day, the DAM prices for energy depend on ERCOT’s dayahead forecasts of wind generation, hourly loads and expectations regarding the availability of generating capacity of various fossil fuels. To model these dayahead prices for the four zones, we consider the following thirteen independent variables.
 Natural gas price (NGP):

this is the price of natural gas at the Henry Hub in Louisiana in millions of US dollars per British thermal unit (USDm/Btu). The daily settlement price is applied to all hours of the day.
 Windpowered forecast (wind):

shortterm wind power forecast and windpowered generation resource production potential are used to compute this forecast in MWh.
 Systemwide nuclear generation (NUC):

in MWh.
 Ancillary service prices:

REGUPP, RRSP and NSPINP (USD/MW) correspond, respectively, to regulation up, responsive reserves and nonspinning reserves. These are the prices (set in the DAM) of various operating reserves. This is generation capacity or demand response that can provide a cushion when load forecasts are inaccurate or generation resources deviate from expected levels.
 Ancillary service quantity:

REGUPQ, RRSQ and NSPINQ correspond, respectively, to regulation up, responsive reserves and nonspinning reserves; these are the quantities of various operating reserves.
 DAM prices from other zones:

prices will be equal in the absence of transmission constraints. But, as consumption rises and constraints on interzonal transmission of electricity emerge, prices will diverge. In each price equation, this will net three unique variables.
 Load forecast (LF) by zone:

the demand projection (in MWh) released by ERCOT for each zone at 06:00 on the day prior to the operating day, which coincides with the start of the DAM.
Model 3: RTM price equations
For the realtime prices in each of the four zones, the following seven variables are considered.
 (1)
The DAM price in each zone.
 (2)
The load forecasting error (LFE) in each zone, provided by ERCOT.
 (3)
Other regions’ total load forecasting error (ORTLFE); this is a simple calculation using the original data.
 (4)
RTM prices from other zones. Like DAM prices, RTM prices will be equal in the absence of transmission constraints. But, as demand rises and constraints on interzonal transmission of electricity emerge, prices will diverge. In each price equation, this will net three unique variables.
 (5)
Windgeneration forecasting error (WFE) in each zone.
3 Linear quantile regression model and its Bayesian information criterion
Following Koenker and Bassett (1978, 1982), Koenker and d’Orey (1987) and Damette and Delacote (2012), and omitting the time subscript as it is not needed in the estimation process, the linear quantile regression model for the $i$th observation is given by
$${y}_{i}={?}_{i}^{\mathrm{T}}{?}^{*}+{e}_{i},i=1,\mathrm{\dots},n,$$  (3.1) 
where the ${e}_{i}$ are independent and identically distributed, $P(e\le 0\mid ?=?)=\tau $ for almost every $?$, ${?}_{i}={({X}_{i1},\mathrm{\dots},{X}_{ip})}^{\mathrm{T}}$ and ${?}^{*}={({\beta}_{1}^{*},\mathrm{\dots},{\beta}_{p}^{*})}^{\mathrm{T}}$. The aim is to consider only ${d}^{*}$ independent variables among the ${X}_{ij}$s, which implies $p{d}^{*}$ covariates are zero in (3.1). The choice of the error distribution is predicated on the data. In the first instance, when we model the demand, DAM and RTM price distributions using all of their respective covariates, we do not assign any parametric model for the error term (see Koenker and Bassett 1978). In other words, we allow the distribution of the dependent variable to be nonparametric. This allows us to assess the type of relationship each covariate has with the dependent variables under models 1–3 without making any distributional assumptions about the data. Once this is done, we pose the following question: under models 1–3, which independent variables are most relevant? To implement this variableselection step, we use a special case of (3.1), namely median regression, given that the data is highly asymmetric and heavy tailed (see Tables 1–3). To employ the BIC metric, following Lee et al (2014), we assume that each ${e}_{i}$ follows an asymmetric Laplace distribution (ALD) whose density function is given by
$$f(e)=\tau (1\tau ){\sigma}^{1}\mathrm{exp}\left(\frac{{\rho}_{\tau}(e)}{2\sigma}\right),$$  (3.2) 
where $$, ${e}_{i}$ is independent of ${?}_{i}$ and $I$ is the indicator function. Thus, the conditional $\tau $quantile of ${y}_{i}$ given ${?}_{i}={?}_{i}$ is ${?}_{i}^{\mathrm{T}}{?}^{*}$. Note that the ALD is a very flexible family suited to the application at hand, in lieu of the summary statistics shown in Tables 1–3.
Adapting the notation from Lee et al (2014), let $S=\{{j}_{1},\mathrm{\dots},{j}_{d}\}\subset \{1,\mathrm{\dots},p\}$ denote a candidate model corresponding to the independent variables ${X}_{j1},\mathrm{\dots},{X}_{jd}$: define ${?}_{s}={({X}_{j1},\mathrm{\dots},{X}_{jd})}^{\mathrm{T}}$, let $S$ be the cardinality $d$ of $S$ and let $({\widehat{\beta}}_{S},\widehat{\sigma})$ be the maximum likelihood estimator of $({\beta}_{S},\sigma )$. Then, the BIC for a linear quantile regression is given by
$$\text{BIC}=\mathrm{log}\left(\sum _{i=1}^{n}{\rho}_{\tau}({y}_{i}{X}_{iS}^{\mathrm{T}}{\widehat{\beta}}_{S})\right)+S\mathrm{log}\frac{n}{2n}{C}_{n},$$  (3.3) 
where ${C}_{n}$ is a positive constant that diverges to infinity as $n$ increases. Lee et al (2014) study the theoretical properties of (3.3) and show that it includes other forms of BIC as special cases; importantly, they show that this generalized BIC consistently identifies the true model in highdimensional quantile regression models.
3.1 Variable selection using $\text{???}(S)$
In the absence of variable selection, assuming the ALD for the error term, we estimate the regression parameters by minimizing the weighted absolute values of the residuals at every quantile level. This is a slight variation from the original estimator proposed by Koenker and Bassett (1978). In other words, the choice of the ALD model for the error does not change the optimization problem developed by Koenker and Bassett. What changes is the likelihood function, which is predicated on the asymmetric Laplace density. Extending this parameter estimation to variable selection, via the BIC, is straightforward; it proceeds by simply introducing an extra penalty parameter. To this end, consider the following.
Let
$${\widehat{\beta}}_{\lambda}={({\widehat{\beta}}_{\lambda ,1},\mathrm{\dots},{\widehat{\beta}}_{\lambda ,p})}^{\mathrm{T}}.$$ 
The process of selecting the best subset model proceeds by choosing $\lambda >0$ as follows:
$$\widehat{\lambda}=\underset{\lambda}{\mathrm{arg}\mathrm{min}}\left(\mathrm{log}\left(\sum _{i=1}^{n}{\rho}_{\tau}({y}_{i}{X}_{iS}^{\mathrm{T}}{\widehat{\beta}}_{\lambda})\right)\right)+{\widehat{S}}_{\lambda}\mathrm{log}\frac{n}{2n}{C}_{n}.$$  (3.4) 
The selected subset ${\widehat{S}}_{\lambda}\equiv \{j:{\widehat{\beta}}_{\lambda ,j}\ne 0,\mathrm{\hspace{0.17em}1}\le j\le p\}$ then comprises those variables that are useful at the $\tau $th quantile of the distribution of the dependent variable $y$. All that remains is to find an estimator for ${\widehat{\beta}}_{\lambda}$. In practice, the two most widely used ones are the Lasso and SCAD estimators. Software packages (such as R) offer both alternatives. Simulation studies generally show that when the number of independent variables is moderate, say less than 20, both methods yield comparable conclusions. That is, the “best” models are generally the same (see Tibshirani (1996) and Fan and Li (2001) for additional discussion of this point). For the application considered here, we let $\lambda $ vary between 0.01 and 1. If a particular independent variable’s coefficient estimate is less than 0.001, it is set to zero, which means it is not useful in the model. We use the Lasso procedure in R (Sherwood and Maidman 2017). When the code converges – usually in a matter of minutes – for the optimal $\lambda $, the module outputs the corresponding estimates for ${\widehat{\beta}}_{\lambda}$ and the BIC values at each of the selected quantiles. We select $\tau =\{0.01,0.05,0.25,0.50,0.75,0.95,0.99\}$ to better capture the impact of the covariates at various positions along the entire distributions of price and demand, including the tails of the respective distributions. In particular, we obtain the median regression at the 50th percentile.
4 Empirical analysis
To better focus the results, we provide copious details for the Houston region using energy consumption (also called load demanded) as the dependent variable; this is what was labeled as model 1. Since the analyses for the DAM and RTM prices as dependent variables proceed similarly, we only report key summaries for them. Once this is accomplished, we summarize the results for the demand, DAM and RTM price models for the remaining three regions. Collectively, these provide inferences for most of ERCOT’s service area, depicted in Figure 1.
4.1 Quantile and mean regressions for Houston demand
Where appropriate, we will employ the following terminology: the quantile curves of the regression coefficients with a long tail at the lower quantiles are called “floor effects”, whereas those with a long tail at the upper quantiles are called “ceiling effects”.^{4}^{4} 4 The phrases “floor effect” and “ceiling effect” have stylized meanings in different areas of research. Here, we use them to delineate sharp differences in how the demand at the opposite tails of the demand distributions responds to the various covariates. In Figure 7, the horizontal axis shows the quantiles of the dependent variable and the vertical axis represents parameter estimates. The black dots are the point estimates at each of the seven quantiles (the $\tau $ values) that lie on the dashed quantile curves. The grayshaded areas are the 95% confidence bands for these functional estimates. The solid black horizontal line is the mean regression estimate, and the dashed red lines are the 95% confidence intervals for the mean. We now discuss each of the panels in Figure 7, starting with the following important observations about all the panels.
 •
For each independent variable, the quantile regression coefficient estimates almost always lie outside the corresponding mean regression 95% confidence intervals. This implies that the “location shift” interpretation of the marginal effects of these variables on the demand for electricity in Houston is implausible.
 •
In the interests of space we do not show similar demand quantile plots for other regions, or the quantile plots for the eight pricing models (four for DAM and four for RTM). Barring a few exceptions, most of the plots are similar to Figure 7 and are available from the authors on request. Thus, a key empirical claim made at the outset is validated.
 •
Let us now examine each of the panels in Figure 7.
 Intercept:

the nonnormality of the distribution of demand in Houston is evident, as the quantile curve is not a line through the origin. The intercept is also the conditional quantile function for some representative case of the various covariates.
 Temperature:

the ceiling and floor effects of temperature on demand are different. The marginal effect of temperature can vary anywhere from roughly 85 MWh in the lowest quantiles to 10 MWh in the uppermost quantiles of the demand distribution, depicting a convex relationship between temperature and demand. In contrast, the linear mean effect is fixed at roughly 20 MWh throughout the distribution of demand.
 Transmission:

all else being fixed, we would expect this slope coefficient to be negative, as transmission prices are charged to large industrial energy consumers and loadserving entities during 4CPs. However, as noted in Section 2, there is considerable uncertainty in this transmission data, and we can expect significant fluctuation in parameter estimates. And indeed this is the case. The disparity between transmission prices at the lower tails and upper tails is significant, as they change from USD2500 to $$USD500 or so in the upper tails. The mean estimates severely underestimate these costs in the range of the floor effects.
 Summer:

here, the floor effect is negative, whereas the ceiling effect is positive. Summer months (June, July and August) have little effect compared with nonsummer months when demand is low, while they tend to have a substantial positive impact when demand is high. This is consistent with reality, as surges in demand are associated with heat waves in the summer months in Texas.
 Hour:

the hourofday (16:00 to 18:00) impact on Houston’s demand distribution is akin to the summermonth dummy variable.
 DAM price:

DAM prices are known a day ahead. Houston’s DAM price effect on demand varies from negative (lower tails) to positive (upper tails). That is, for each USD1/MWh change in DAM prices known on day $t1$, demand could decrease by roughly 10 MWh or increase by as much as 30 MWh on day $t$. Note that the mean estimate of this marginal effect is clearly biased, especially in the upper tails.
 Lagged load:

demand on day $t$ is an increasing function of demand on day $t1$ up to the 80th percentile, after which it declines. This parabolic relationship is completely missed by the static, linear mean estimate.
 2hrMA RTM:

this variable is day $t$’s twohour moving average of RTM prices in Houston. An increasing convex function, the ceiling effect of these prices is significantly larger than the floor effect. That is, RTM prices have a stronger positive impact on demand when demand is high than when demand is low.
 PriceSpike300:

this indicator variable captures the impact of DAM prices exceeding USD300. In the lower tails of the demand distribution, the marginal effect of DAM prices exceeding USD300 is positive; that is, demand increases. But the impact is negative in the upper tails. Note, however, that, unlike the other independent variables, this covariate has very wide confidence bands for the quantile and leastsquares estimates, suggesting considerable uncertainty in its impact on demand.
 PriceSpike2000:

this indicator variable captures the impact of DAM prices exceeding USD2000. The marginal effect of DAM prices exceeding USD2000 is slightly positive in the lower tails of the demand distribution, and is negative in the upper tails. That is, consistent with expectations, very large price spikes are negatively associated with very high demand; conversely, very large price spikes are positively, but weakly, related to demand when demand is low. The static mean regression estimate of this dummy variable, like the others, fails to capture such insights.
We next discuss the MLE and OLS estimates for the coefficients given in Table 4; these are the empirical values corresponding to the plots in Figure 7.
Lagged  
DAM  Lagged  RTM  Price  Price  
Quant  Intercept  Temp  Trans  Summer  Hour  price  load  price  300  2000 
0.01  $$9 745.293  85.109  2 550.72  $$97.99  $$89.164  $$8.926  0.503  0.598  1 660.938  3 457.539 
(2 016.733)${}^{***}$  (3.572)${}^{***}$  (182.327)${}^{***}$  (50.096)${}^{*}$  (65.682)${}^{\u2020}$  (2.321)${}^{***}$  (0.013)${}^{***}$  (0.013)${}^{\u2020}$  (1 193.991)${}^{\u2020}$  (1 674.522)${}^{\u2020}$  
0.05  $$6 345.56  69.699  1 863.415  56.359  42.499  $$3.803  0.565  0.513  1 007.319  1 839.708 
(1 346.777)${}^{***}$  (2.081)${}^{***}$  (190.188)${}^{***}$  (33.939)${}^{\u2020}$  (36.715)${}^{\u2020}$  (2.047)${}^{\u2020}$  (0.01)${}^{***}$  (0.01)${}^{***}$  (769.718)${}^{\u2020}$  (982.776)${}^{\u2020}$  
0.25  $$1 419.799  39.894  749.411  103.312  25.74  0.46  0.754  0.356  $$679.698  487.961 
(655.793)${}^{*}$  (0.926)${}^{***}$  (161.265)${}^{***}$  (17.483)${}^{***}$  (22.385)${}^{\u2020}$  (0.457)${}^{\u2020}$  (0.004)${}^{***}$  (0.004)${}^{***}$  (595.113)${}^{\u2020}$  (275.994)${}^{\u2020}$  
0.5  $$619.332  27.328  322.61  54.351  97.614  1.075  0.839  0.41  $$167.646  $$22.605 
(730.166)${}^{\u2020}$  (0.806)${}^{***}$  (112.245)${}^{**}$  (15.854)${}^{***}$  (23.013)${}^{***}$  (0.762)${}^{\u2020}$  (0.005)${}^{***}$  (0.005)${}^{***}$  (660.204)${}^{\u2020}$  (170.2)${}^{\u2020}$  
0.75  2 015.5  19.94  $$23.087  100.129  151.204  3.635  0.847  1.003  $$691.734  $$1 189.598 
(865.866)${}^{*}$  (0.729)${}^{***}$  (183.621)${}^{\u2020}$  (21.565)${}^{***}$  (21.552)${}^{***}$  (1.183)${}^{**}$  (0.006)${}^{***}$  (0.006)${}^{***}$  (490.101)${}^{\u2020}$  (672.677)${}^{\u2020}$  
0.95  9 270.176  15.695  182.264  336.391  298.044  17.342  0.731  5.427  $$790.101  $$7 134.632 
(3 711.778)${}^{*}$  (1.384)${}^{***}$  (331.882)${}^{\u2020}$  (38.51)${}^{***}$  (60.089)${}^{***}$  (3.473)${}^{***}$  (0.012)${}^{***}$  (0.012)${}^{***}$  (1 255.587)${}^{\u2020}$  (3 420.917)${}^{*}$  
0.99  16 133.659  12.247  $$724.534  262.755  466.414  28.948  0.687  5.932  $$1 953.889  $$11 103.749 
(5 474.131)${}^{**}$  (2.443)${}^{***}$  (254.986)${}^{**}$  (77.47)${}^{***}$  (129.765)${}^{***}$  (6.437)${}^{***}$  (0.023)${}^{***}$  (0.023)${}^{***}$  (1 789.497)${}^{\u2020}$  (5 326.133)${}^{*}$  
OLS  194.985  27.863  535.344  181.097  136.338  0.721  0.786  3.282  $$315.282  88.476 
(34.166)${}^{***}$  (0.552)${}^{***}$  (145.175)${}^{***}$  (15.404)${}^{***}$  (16.950)${}^{***}$  (0.251)${}^{**}$  (0.003)${}^{***}$  (0.134)${}^{***}$  (261.311)${}^{\u2020}$  (873.454)${}^{\u2020}$ 
Barring the two pricespike dummy variables, the OLS estimates for all the other coefficients are significant at least at the 0.01 level of significance. Likewise, the quantile regression coefficients for these dummy variables are not significant, except for the price spike USD2000 dummy variable, which is significant ($$) at the two uppermost quantiles. Most of the quantile regression coefficient estimates for the other variables are significant. This is consistent with the plots. Thus, the empirical summaries further validate our empirical claim that to better understand the marginal effects of key variables on demand for electricity in ERCOT, it is critical not to rely on conditional mean regressions.
Given the above, which of the nine independent variables we have chosen to work with to explain Houston’s demand are really relevant? To answer this question, we could use a variable selection procedure at every quantile shown in Table 4. But, from a practical perspective, it may suffice to work with just one quantile, namely the 50th percentile (or median). It is evident from the plots that the mean generally over or underestimates the marginal effects as it gets pulled in the direction of the asymmetry of the demand distribution. However, the median – as a measure of central tendency – does not. Hence, in the next subsection, we only discuss variable selection at the median of the demand distribution.^{5}^{5} 5 We also selected this variable at each of the chosen quantiles detailed in Table 4 and Figure 7. Since the overall conclusions are qualitatively similar, they are omitted due to space restrictions and are available from the author on request. For comparison, we also execute a standard stepwise variable selection at the mean.
4.1.1 Variable selection for Houston demand model
Using the BIC, we employ variable selection for the Houston demand models. The variables selected under the median (50th percentile) and mean regressions are summarized in Table 5. Note that, of the nine variables, the mean regression selects six covariates, whereas the median regression selects the four of the same variables; three variables are common to both the mean and median models – temperature, lagged load and 2hrMA RTM price. Interestingly, the variables that are discarded by the median regression coincide with those that were not significant and/or had large variances from the analysis in Section 4.1. Finally, lagged load (demand) is significant and relevant under the median model but not the mean model. The latter model, in turn, selects the summer, hour and transmission dummy variables. These dummy variables are precisely the ones that represent the tails of the demand distribution; hence, the mean is more sensitive to them than the median. Indeed, by construction, lagged load, to some extent, captures the impact of these dummy variables. Thus, in addition to being parsimonious, the variables chosen under the median variable selection procedure reflect the changing marginal effects on the entire conditional distribution of demand better than the variables chosen under the conditional mean alone.
Method  
Median  Mean  
Intercept  $$416.688  181.829 
Temp  27.838  27.779 
Trans  0  572.186 
Summer  0  177.082 
Hour  0  140.770 
Lagged DAM price  0.963  0 
Lagged load  0.846  0.790 
RTM price  3.486  3.337 
Price300  0  0 
Price2000  0  0 
4.1.2 Variable selection for Houston DAM price model
Recall that there are thirteen independent variables in each of the four regions’ DAM price models. Using the analytic process described for the Houston demand model in Sections 3 and 4.1.1, consider the variables for Houston’s DAM equation selected under the median and mean regressions, which are summarized in Table 6. Five variables are left out in the median regression, while only two are deemed irrelevant in the mean variable selection process.
Method  
Median  Mean  
Intercept  $$0.089  $$3.667 
Natural gas price  0.141  1.172 
Wind  0  0 
Nuclear  0  $$0.001 
REGUPP  0.006  0.040 
RRSP  0.017  0.076 
NSPINP  $$0.005  $$0.068 
REGUPQ  $$0.001  $$0.006 
RRSQ  0  0 
NSPINQ  0  0.001 
DAM1  0.36  0.555 
DAM2  0.65  0.524 
DAM3  $$0.02  $$0.121 
Load forecast  0  0.001 
4.1.3 Variable selection for Houston RTM price model
Recall that there are seven independent variables in each of the four regions’ RTM price models. Using the same process as before, consider the variables for Houston’s RTM equation, which are selected under the median and mean regressions summarized in Table 7. Unlike the demand and DAM price models for Houston, for RTM prices, the same set of independent variables is selected by both the mean and median regressions. However, even in this instance, the median maximum likelihood estimates of the parameters are better than the corresponding mean estimates, since the latter are adversely influenced by extreme values and fat tails. Indeed, the MLE and OLS estimates are markedly different, which implies that the marginal effects of these covariates under the median and mean models are substantially different.
Method  
Median  Mean  
Intercept  0.012  $$4.343 
DAM  0.001  0.078 
LFE  0  0 
ORTLFE  0  0 
RTM1  0.725  $$0.214 
RTM2  0.263  0.905 
RTM3  0.009  0.278 
WFE  0  0 
4.1.4 Variable selection for North, South and West’s demand and price models
We mimic the Houston analyses for North, South and West’s demand and price models. For brevity, we omit the details. Here, we summarize the main conclusions.
 (1)
Using the median regression variable selection process, the set of independent variables in Houston is also relevant for the other zones. This robust result is very encouraging. However, this is untrue for the mean regression variable selection across the four zones.
 (2)
Figure 7 clearly shows the nonlinear relationships between Houston’s demand and its covariates. The plots for the other regions are similar.
 (3)
DAM price models: like the demand models for the four regions, here there is considerable overlap in the variables that are selected for each of the four regions’ DAM pricing models. South is the only region that shows some minor differences in terms of which variables impact its DAM price distribution. The mean regression models once again select more variables than the median regressions for all these three zones’ DAM models – a less parsimonious outcome that is not surprising given the marginal effects are static under the mean regression approach.
 (4)
RTM price models: recall from above that, in the RTM pricing model for Houston, the same set of variables was chosen by both the median and mean regression variable selection processes. This is generally true for the remaining three regions’ RTM pricing models: of the seven independent variables, the DAM price in the region of interest, and the three appropriate RTM prices in the regions that appear as independent variables, are always relevant (barring a couple of instances) under both the median and mean regression variable selection procedures.
5 Discussion
An understanding of the variables impacting price formation in restructured electricity markets is of great interest to electricity generators, retailers, utilities and policymakers, while an understanding of the determinants of the demand for electricity is of critical importance to system operators and planners striving to maintain reliability. Often linear or loglinear regression models are used to explain price patterns and to explain and forecast demand. Yet, the extreme volatility in electricity prices presents challenges. The variables that might explain a spike in energy prices can differ from those that might explain patterns at lower price levels. Similarly, weather variables might be important determinants of the level of demand when demand is high but may have limited explanatory power when demand is relatively low. Using quantile regressions and variable selection methods, we identified the strongest determinants of price and demand variables when these variables were at different levels or percentiles. A large data set from the ERCOT market was applied.
We found that quantile regressions are better suited than mean regressions to model the distributions of realtime and dayahead prices as well as electricity demand in four of the largest ERCOT zones (Houston, North, South and West); these regions serve 85% of the electrical needs of the state with the largest electricity consumption in the United States. Using the BIC metric, a quantile variable selection method showed that the same four of the nine variables for the demand models are most relevant in all four regions – a robust and parsimonious finding. For the dayahead prices, eight out of thirteen independent variables are relevant; barring the South region, this too is a robust result. For the realtime price distribution, four of the seven covariates are relevant for all regions; three out of these four are the same. In contrast, for the demand and price models, the corresponding mean regression variable selections typically yield larger subsets of relevant variables – a less parsimonious outcome; importantly, given the nonnormal nature of the price and demand distributions, the marginal impacts of the key drivers of price and demand from the mean regressions over or underestimate the relationships. This was further confirmed by examining appropriate quantile functional plots that depict highly nonlinear relationships between the covariates and the response variables. Finally, since it is impossible to know the “true” model, the quantile regression variable selection approach used in this paper could, we hope, help guide practitioners in the direction of the variables that are most important to modeling demand in ERCOT’s four largest zones. In addition to sound contextual insights, ERCOT needs to gather data on fewer variables to better understand price and demand fluctuations, leading to more efficient use of time and resources.
It would be interesting to use quantile regressions to study the impacts of wind and solar generation on wholesale market prices. Zarnikau et al (2016) study the effects of windgeneration development on dayahead and realtime electricity market prices in Texas using standard methods. Only recently has there been sufficient solar energy development in Texas to enable any analysis of its impact on prices. An initial quantification of solar energy development upon prices is provided in Zarnikau et al (2019b).
Another area for future study is to use the variable selection methodology in this paper to select variables for predicting distribution at particular hours, rather than by zone, as in this study. Variables that impact wholesale consumption in the morning hours might be different than those in the evening hours.
Declaration of interest
The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.
References
 Bessa, R., Miranda, V., Botterud, A., Zhou, Z., and Wang, J. (2012). Timeadaptive quantilecopula for wind power probabilistic forecasting. Renewable Energy 40(1), 29–39 (https://doi.org/10.1016/j.renene.2011.08.015).
 Cabrera, B., and Schulz, F. (2017). Forecasting generalized quantiles of electricity demand: a functional data approach. Journal of the American Statistical Association 112, 127–136 (https://doi.org/10.1080/01621459.2016.1219259).
 Damette, O., and Delacote, P. (2012). On the economic factors of deforestation: what can we learn from quantile analysis? Econometric Modeling 29(6), 2427–2434 (https://doi.org/10.1016/j.econmod.2012.06.015).
 Damien, P., FuentesGarcia, R., Mena, R. H., and Zarnikau, J. (2019). Impacts of dayahead versus realtime market prices on wholesale electricity demand in Texas. Energy Economics 81, 259–272 (https://doi.org/10.1016/j.eneco.2019.04.008).
 Fan, J., and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 96, 1348–1360 (https://doi.org/10.1198/016214501753382273).
 Hagfors, L. I., Bunn, D., Kristoffersen, E., Staver, T., and Westgaard, S. (2016a). Modeling the UK electricity price distributions using quantile regression. Energy 102, 231–243 (https://doi.org/10.1016/j.energy.2016.02.025).
 Hagfors, L. I., Kamperud, H., Paraschiv, F., Prokozcuk, M., Sator, A., and Westgaard, S. (2016b). Prediction of extreme price occurrences in the German dayahead electricity market. Quantitative Finance 16(12), 1929–1948.
 He, Y., Xu, Q., Wan, J., and Yang, S. (2016). Shortterm power load probability density forecasting based on quantile regression neural network and triangle kernel function. Energy 114, 498–512 (https://doi.org/10.1016/j.energy.2016.08.023).
 Heckman, J., Smith, J., and Clements, N. (1997). Making the most out of programme evaluations and social experiments: accounting for heterogeneity in programme impacts. Review of Economic Studies 64(4), 487–535 (https://doi.org/10.2307/2971729).
 Koenker, R., and Bassett, G., Jr. (1978). Regression quantiles. Econometrica 46, 33–50 (https://doi.org/10.2307/1913643).
 Koenker, R., and Bassett, G., Jr. (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica 19, 43–61 (https://doi.org/10.2307/1912528).
 Koenker, R., and d’Orey, V. (1987). Algorithm AS 229: computing regression quantiles. Applied Statistics 36(3), 383–393 (https://doi.org/10.2307/2347802).
 Koenker, R., and Hallock, K. (2001). Quantile regression. The Journal of Economic Perspectives 15, 143–156 (https://doi.org/10.1257/jep.15.4.143).
 Lebotsa, M., Sigauke, C., Bere, A., Fildes, R., and Boylan, J. (2018). Shortterm electricity demand using partially linear additive quantile regression with an application to the unit commitment problem. Applied Energy 222, 104–118 (https://doi.org/10.1016/j.apenergy.2018.03.155).
 Lee, E., Noh, H., and Park, B. (2014). Model selection via Bayesian information criterion for quantile regression models. Journal of the American Statistical Association 109, 216–229 (https://doi.org/10.1080/01621459.2013.836975).
 Li, Z., Hurn, A., and Clements, A. (2017). Forecasting quantiles of dayahead electricity load. Energy Economics 67, 60–71 (https://doi.org/10.1016/j.eneco.2017.08.002).
 Machado, J. (1993). Robust model selection and $M$estimation. Econometric Theory 9(3), 478–493 (https://doi.org/10.1017/S0266466600007775).
 Maciejowska, K., Nowotarski, J., and Weron, R. (2016). Probabilistic forecasting of electricity spot prices using factor quantile regression averaging. International Journal of Forecasting 32(3), 957–965 (https://doi.org/10.1016/j.ijforecast.2014.12.004).
 Mosteller, F., and Tukey, J. (1977). Data Analysis and Regression: A Second Course in Statistics. Addison Wesley, Boston, MA.
 Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Annals of Statistics 12(2), 758–765 (https://doi.org/10.1214/aos/1176346522).
 Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics 6(2), 461–464 (https://doi.org/10.1214/aos/1176344136).
 Sherwood, B., and Maidman, A. (2017). rqPen: penalized quantile regression. R package, Version 2.0.
 Taylor, J. (2019). Forecasting value at risk and expected shortfall using a semiparametric approach based on the asymmetric Laplace distribution. Journal of Business and Economic Statistics 37(1), 121–133 (https://doi.org/10.1080/07350015.2017.1281815).
 Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society B 58(1), 267–288 (https://doi.org/10.1111/j.25176161.1996.tb02080.x).
 Treadway, N. (2015). The annual baseline assessment of choice in Canada and the United States (ABACCUS). Report, Distributed Energy Financial Group. URL: https://bit.ly/38mS4RQ.
 Tsai, C., and Eryilmaz, D. (2018). Effect of wind generation on ERCOT nodal prices. Energy Economics 76, 21–33 (https://doi.org/10.1016/j.eneco.2018.09.021).
 Wang, H., Li, G., and Jiang, G. (2007a). Robust regression shrinkage and consistent variable selection through the LADLasso. Journal of Business and Economic Statistics 25(3), 347–355 (https://doi.org/10.1198/073500106000000251).
 Wang, H., Li, R., and Tsai, C. (2007b). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94(3), 553–568 (https://doi.org/10.1093/biomet/asm053).
 Wang, L., Wu, Y., and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultrahigh dimension. Journal of the American Statistical Association 107, 214–222 (https://doi.org/10.1080/01621459.2012.656014).
 Woo, C.K., Zarnikau, J., Moore, J., and Horowitz, I. (2011). Wind generation and zonalmarket price divergence: evidence from Texas. Energy Policy 39(7), 3928–3938 (https://doi.org/10.1016/j.enpol.2010.11.046).
 Woo, C.K., Horowitz, I., Horii, B., Orans, R., and Zarnikau, J. (2012). Blowing in the wind: vanishing payoffs of a tolling agreement for naturalgasfired generation of electricity in Texas. Energy Journal 33(1), 207–229 (https://doi.org/10.5547/ISSN01956574EJVol33No18).
 Wu, Y., and Liu, Y. (2009). Variable selection in quantile regression. Statistica Sinica 19, 801–817.
 Zarnikau, J., Woo, C.K., Gillett, C., Ho, T., Zhu, S., and Leung, E. (2014). Dayahead forward premiums in the Texas electricity market. Journal of Energy Markets 8(2), 1–20 (https://doi.org/10.21314/JEM.2015.126).
 Zarnikau, J., Woo, C.K., and Zhu, S. (2016). Zonal meritorder effects of wind generation development on dayahead and real time electricity market prices in Texas. Journal of Energy Markets 9(4), 17–47 (https://doi.org/10.21314/JEM.2016.153).
 Zarnikau, J., Woo, C.K., Zhu, S., and Tsai, C. (2019a). Market price behavior of wholesale electricity products: Texas. Energy Policy 125, 418–428 (https://doi.org/10.1016/j.enpol.2018.10.043).
 Zarnikau, J., Woo, C.K., Zhu, S., and Tsai, C.H. (2019b). Will Texas’ operating reserve demand curve likely provide adequate investment incentive for naturalgasfired generation? Journal of Energy Policy, forthcoming.
Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.
To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here: http://subscriptions.risk.net/subscribe
You are currently unable to print this content. Please contact [email protected] to find out more.
You are currently unable to copy this content. Please contact [email protected] to find out more.
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Printing this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions  https://www.infoproinsight.com/termsconditions/insightsubscriptions/
If you would like to purchase additional rights please email [email protected]
Copyright Infopro Digital Limited. All rights reserved.
You may share this content using our article tools. Copying this content is for the sole use of the Authorised User (named subscriber), as outlined in our terms and conditions  https://www.infoproinsight.com/termsconditions/insightsubscriptions/
If you would like to purchase additional rights please email [email protected]