Are there multiple independent risk anomalies in the cross section of stock returns?

Benjamin R. Auer and Frank Schuhmacher

Need to know

• We show that many risk measures represent independent low-risk anomalies.
• Consequence 1: asset pricing models need to consider various types of risk.
• Consequence 2: stock selection can benefit from using more than one risk measure.

Abstract

Using multivariate portfolio sorts, firm-level cross-sectional regressions and spanning tests, we show that, in the cross section of stock returns, most commonly used risk measures in academia and in practice are separate return predictors with negative slopes. That is, in contrast to what many researchers might expect, there are multiple risk anomalies that are independent of each other. This implies that, in empirical asset pricing models, even different forms of total risk can be simultaneously relevant. Further, it suggests that investors trading based on one risk measure can obtain significant gains when also trading based on another. For example, an investor selecting stocks based on volatility can earn a significant monthly alpha by also considering the information contained in the maximum drawdown.

1 Introduction

Frazzini and Pedersen (2014) provide strikingly robust evidence on the low-risk anomaly, according to which low-risk stocks have historically provided higher risk-adjusted returns than their riskier counterparts. They find that portfolio Sharpe ratios and alphas are almost monotonically declining with rising beta for US and international equities, Treasury bonds, corporate bonds, and for futures on exchange rates and commodities. Further, their betting-against-beta (BAB) factors, which are constructed for each asset class by buying low-beta assets and selling high-beta assets, produce significant abnormal returns that cannot be explained by traditional, rational theories of asset prices or the returns of classic factor portfolios.

These results are not completely new. Black (1972) and Black et al (1972) had already found that the relationship between risk and return was in fact positive, but much flatter than predicted by the classic capital asset pricing model (CAPM). Haugen and Heins (1975) even found a negative relationship between risk and return, meaning that risk did not generate a special reward in their sample. Twenty years later, Fama and French (1992) also observed a flat relationship between beta and return. Another 15 years later, Ang et al (2006b) discovered a strong negative relationship between idiosyncratic volatility and average returns, which attracted renewed attention to this kind of result (see Bali and Cakici 2008; Ang et al 2009; Fu 2009). More recently, Baker et al (2011) and Dutt and Humphery-Jenner (2013) provided similar results for another widely used measure of risk: total volatility. They stated that the long-term outperformance of low-risk portfolios is perhaps the greatest anomaly in finance because its magnitude and robustness together challenge the basic notion of a risk–return trade-off.

Researchers have proposed several explanations for the low-risk anomaly (see Blitz and van Vliet 2007). For example, Frazzini and Pedersen (2014) argue that restricted borrowing may create strong demand for high-risk stocks, whereas Baker et al (2011) state that the anomaly may be related to fund managers’ mandate to beat fixed benchmarks. Further, Fiore and Saha (2015) point out that the effect may be caused by a summer–winter seasonality in stock returns. Similarly, Antoniou et al (2016) attribute the low-risk anomaly to investor sentiment: periods of optimism attract equity investment in risky opportunities by unsophisticated, overconfident traders. Finally, Baker et al (2011) nicely summarize that the low-risk anomaly could exist simply because of using the “wrong” measure of risk.

Over the past decades, academics have devoted considerable energy to developing rational models based on alternative measures of risk. For example, Markowitz (1959) proposed semivariance as an alternative to variance because to him the former seemed to be the more plausible risk measure. However, because of greater computational difficulties he also suggested gaining experience with the variance as a risk measure first. Hence, the first CAPM developed by Sharpe (1964), Lintner (1965) and Mossin (1966) used the mean–variance framework. Ten years later, Bawa and Lindenberg (1977) replaced the variance with the lower partial moment (LPM) and formulated a CAPM in a mean-LPM framework. The security market line in this framework is identical in form to the traditional CAPM except that the traditional beta is replaced by an LPM-based beta.11 1 This model has been generalized by Harlow and Rao (1989) such that a large class of pricing models using alternative risk measures (variance, semivariance, semideviation, probability of loss, etc) become special cases of the new framework. A further modification is the drawdown-based CAPM of Zabarankin et al (2014), which establishes a linear relationship between the expected return of an asset and its drawdown beta. Although both these models (Bawa and Lindenberg 1977; Zabarankin et al 2014) measure systematic risk in different ways, they are similar in that only systematic risk is priced. A different approach comes from Merton (1987), who shows that in an information-segmented market there should be a positive relationship between firm-specific volatility and average returns to compensate investors for holding imperfectly diversified portfolios. That is, if investors cannot hold the market portfolio, they will care about total risk, not simply market risk. In addition to this research, there is a huge body of literature on mean-risk models, many of which have not yet been used to modify the CAPM. Important models have been suggested and analyzed by, for example, Roy (1952), Porter (1974), Bawa (1975, 1978), Fishburn (1977), Yitzhaki (1982), Konno and Yamazaki (1991), Ogryczak and Ruszczyński (1999), Rockafellar and Uryasev (2000), Krokhmal et al (2002), Gaivoronski and Pflug (2005) and Homm and Pigorsch (2012).

Given this wide variety of available risk measures, could one of them be considered the correct measure of risk that provides a robust positive risk–return relationship and thus solves the low-risk puzzle? To find an answer, we start our analysis by investigating whether anomalous effects occur for the 25 (specifications of) risk measures most commonly used in practice and in academia. This selection of risk measures includes metrics of total risk (eg, LPMs, several value-at-risk (VaR) variants and drawdown-based measures), systematic risk (eg, betas based on LPMs and drawdowns) and unsystematic risk (eg, idiosyncratic volatility from single- and multifactor models). Quantile and long–short arbitrage portfolios constructed based on monthly US stock market data covered by the Center for Research in Security Prices (CRSP) database from 1926 to 2014 reveal a low-risk anomaly for each risk measure and similar anomaly magnitudes across very conceptually different risk measures. For example, we document an anomaly for more complex measures fitting specific return distributions, and even when using only the lowest daily return (of the previous 12 months) to quantify risk. Further, as a corollary of our study, we highlight that the occurrence and similarity of the low-risk anomalies across different risk measures is not driven by illiquid stocks. For example, the anomalies are strong even in the small and highly liquid subset of Dow Jones Industrial Average (DJIA) stocks. This is interesting because in such efficiently priced segments, we would expect anomalies to be weak or nonexistent (see Chordia et al 2008, 2014).

In search of an explanation for the performance similarity across risk measures, we implement bivariate portfolio sorts, firm-level cross-sectional regressions and spanning tests, which are typically used to answer the question of whether one anomalous cross-sectional effect simply captures another one or whether it is indeed a separate phenomenon. These methodologies have been used, for example, to distinguish different types of momentum strategies (see Chan et al 1996; Novy-Marx 2015). Two of these strategies are typically said to have separate explanatory power for future stock returns, ie, one strategy does not subsume the other if, first, return spreads occur in both directions of the bivariate portfolio sorts; second, both are significant in cross-sectional regressions; and third, a significant alpha remains when regressing one strategy return on the other in spanning tests. In our context, researchers believing in the CAPM might argue that the low-risk anomaly originates from beta (eg, because the literature has most extensively documented the anomaly for this measure) and that the other risk measures simply approximate this dominant effect. Similarly, others might say that the anomaly originates from another risk measure and is approximated by the remaining measures (eg, by different forms of betas capturing the systematic component of a measure of total risk). To shed light on this issue, we implement the three anomaly-separating techniques for all possible combinations of risk measures.

In our portfolio sorts and cross-sectional regressions, there appears to be no dominant risk variable in the cross section of stock returns (ie, no main driver of the anomaly) whose effects are captured by all alternative measures. Instead, we detect many cases of more than one significant risk variable with a negative impact on returns. Such a finding might be intuitive when differentiating between measures of total, systematic and unsystematic risk. However, it also occurs among, for example, measures of total risk with high rank correlation, where many researchers (including ourselves) would expect the risk measures to add little value to one another. Therefore, others (see, for example, Gebhardt et al 2005; Bali et al 2011) might argue that our results could be caused by the well-known limitations of portfolio sorts and cross-sectional regressions. However, our spanning tests, which are currently considered to be the best available methodology (see Novy-Marx 2012, 2013, 2014, 2015), provide similar insights.

Apart from the sign of the risk–return relationship, our results support the general idea of theoretical models with more than one measure of risk in the pricing equation (see, for example, Friend and Westerfield 1980; Fang and Lai 1997) and, in particular, the relevance of total risk as suggested by Merton (1987). With respect to the latter, we can even go one step further by arguing that different forms of total risk matter. Traders using one specific risk measure for stock selection (eg, the standard deviation) can gain significantly by relying on an additional measure (eg, the maximum drawdown). This finding is in contrast to studies arguing that high rank correlation makes the choice between measures of total risk largely irrelevant (see Pfingsten et al 2004). Consequently, our results are of interest not only for asset pricing, but also for practical investment decisions and especially for the construction of (risk-focused) funds, which is a hot topic for large investment firms (see Asness et al 2012; Moreira and Muir 2017). Our finding of negative slopes for all risk measures is not a crucial problem for such applications. However, in the spirit of Novy-Marx (2012), it poses a significant difficulty for stories that purport to explain the low-risk anomaly, because none of the popular explanations imply independent anomalies for all kinds of risk measures.

The remainder of our paper is organized as follows. Section 2 systematizes and defines our selection of risk measures.22 2 Readers with detailed background knowledge may skip this section because the rest of the paper can also be understood without these prerequisites. It also briefly describes our stock market data sources. Section 3 documents the existence of low-risk anomalies for our risk measures via quantile and arbitrage portfolios. Section 4 contains our study of anomaly interlinkages based on rank correlations, portfolio sorts, cross-sectional regressions and spanning tests. Section 5 supplements our empirical analysis with a variety of robustness checks and discusses the connection of our findings to other measures (including the maximum return and skewness) that have also recently been discussed in the context of investment risk. Finally, Section 6 presents our conclusions and outlines directions for future research.

2 Risk measures and equity data

2.1 Risk measure definitions

A systematic review of the literature reveals a large variety of frequently used risk measures in practice and in academia. Following our discussion in the previous section, Table 1 subdivides them into metrics of total, systematic and unsystematic risk. Within each of these categories, we can differentiate classic, LPM-based, VaR-based and drawdown-based measures. However, some combinations of main groups and subgroups (eg, betas focusing on VaR or unsystematic risk measures using LPMs, VaR or drawdowns) have not yet been considered in the literature.33 3 Kaplanski (2004) proposes a first beta based on conditional VaR. However, so far, no empirically tractable formula for this measure allows its use in typical practical applications. We do not intend to fill these gaps; instead we analyze existing measures that have become standards in the literature.

2.1.1 Measures of total risk

Classic measures.

The first subgroup of total risk metrics contains simple classic measures with a long history in financial research. Probably the best-known metric of this category is the standard deviation (SD), which is a key risk measure in traditional portfolio optimization (see Markowitz 1952) and which is the main measure of total risk employed to document the low-risk anomaly (see Baker et al 2011; Dutt and Humphery-Jenner 2013). The mean absolute deviation (MAD) and minimum return (MIN), the smallest order statistic, which also belong to this subgroup, are typically used in absolute deviation decision rules (Konno and Yamazaki 1991) and minimax portfolio selection (Young 1998).

Lower partial moments.

The second category of total risk measures covers LPMs that measure risk by negative deviations of the returns realized in relation to a minimal acceptable return $\tau$, which is usually zero or the risk-free rate. The order $n$ of the LPM determines the extent to which return deviations are weighted. Thus, $n$ should be higher for more risk-averse investors (see Fishburn 1977; Bawa 1975, 1978). Practitioners tend to set $n=1,2,3$, and the corresponding LPMs have reached the status of autonomous risk measures (see Eling and Schuhmacher 2007).

Value-at-risk.

The third subgroup comprises metrics related to VaR, a risk measure that describes the maximum percentage loss of an investment that is not exceeded with a given confidence level $1-\alpha$. In our study, we focus on the five most frequently used variants. Standard VaR is formally defined using the $\alpha$-quantile of the return distribution either under the assumption of a normal distribution (ND) or based on historical simulation (HS) that can capture nonnormality in returns (see Jorion 2007). Instead of these standard VaRs, the literature frequently considers conditional VaR (CVaR), also called expected shortfall (ES), which is known to have better properties (see Frey and McNeil 2002). It measures the expected percentage loss under the condition that VaR is exceeded, and it can also be obtained based on normality or HS (see McNeil et al 2005).44 4 Both VaR and ES, largely under the radar, have been used in the anomaly studies of Huffman and Moll (2013) and Atilgan et al (2020). Finally, more recent research concentrates on deriving VaR estimates from extreme value theory because it provides a firm theoretical foundation for modeling extreme events (see Gilli and Këllezi 2006). The basic idea of the most popular approach is to fit a generalized Pareto distribution (GPD) to return exceedances over a given threshold $u$ via pseudo maximum likelihood and then to use its estimated parameters to calculate extreme VaR (EVaR). In a simulation of selected theoretical return distributions, McNeil and Frey (2000) show that EVaR estimates are characterized by crucial errors if $u$ is set too high and suggest that it should be selected such that at least 10% of the sample size is used for fitting the GPD.55 5 Even though there is extensive literature on the statistically optimal setting of $u$ (see Scarrott and MacDonald 2012), $u$ is customarily set manually because it reflects investors’ risk tolerances, which cannot be determined based purely on statistical theory (see Tsay 2005, Chapter 7.7).

Drawdowns.

Our last group of total risk measures covers drawdowns, which, like VaR, focus on worst-case events and are particularly popular among investment professionals (see Bacon 2008). These are the maximum drawdown (MDD), the average drawdown (ADD), the drawdown deviation (DDD), the pain index (PI) and the ulcer index (UI). Following Schuhmacher and Eling (2011), we define drawdowns on the basis of cumulated uncompounded returns. While MDD relies on only one drawdown, ADD and DDD consider the $K$ most significant continuous drawdowns. In contrast, PI and UI are based on drawdowns from a previous peak and thereby incorporate the impact of the duration of drawdowns.66 6 In the case of no negative returns, the drawdown-based risk measures (and their subdrawdowns) are defined to have a value of zero.

2.1.2 Measures of systematic risk

Probably the best-known measure of systematic risk is the market beta, which we assign to the subcategory of classic measures. Betas are obtained using three alternative estimators. First, the textbook way of estimating beta is to use ordinary least squares for calculating the slope of a market model regression. Among others, $\smash{\beta^{\mathrm{TB}}}$ was used by Baker et al (2011) and Chow et al (2014) to study the low-risk anomaly. In the following, all risk measures that are not $\smash{\beta^{\mathrm{TB}}}$ will be called alternative risk measures (ARMs). Second, Frazzini and Pedersen (2014) use a modified version of this beta in their analysis of the low-risk anomaly. In their $\beta^{\mathrm{FP}}$, volatilities and correlation are estimated separately. Specifically, they use a smaller horizon for volatilities (one year) than for the correlation (five years) to account for the fact that correlations move more slowly than volatilities (see De Santis and Gerard 1997). Further, they use one-day returns to estimate volatilities and they use overlapping three-day returns for correlations to control for nonsynchronous trading that affects only correlations. Third, Tofallis (2008) points out that most classic interpretations of beta are not consistent with the formulas used to estimate them and the author proposes an alternative $\beta^{\mathrm{TO}}$ based on a geometric mean functional relationship that has several advantages over $\smash{\beta^{\mathrm{TB}}}$ and $\beta^{\mathrm{FP}}$. For example, it treats both (asset and market) variables the same way and delivers a unique slope for the underlying structural relationship.77 7 There are two additional advantages. First, $\beta^{\mathrm{TO}}$ is in line with the relative volatility interpretation in the beta literature because its volatility ratio is not distorted by the numeric value of the correlation. Second, it is optimal in the sense that it involves both horizontal and vertical deviations from the regression line because it minimizes the sum of products of these deviations (see Woolley 1941).

While these three betas are related to mean–variance theory, two others emerge from more recent portfolio models. Bawa and Lindenberg (1977) show that, within a mean-LPM framework, a different kind of beta is the appropriate measure of systematic risk. For orders of $n=1,2$, this $\beta^{\mathrm{LPM}(n)}$ is determined by the lower partial moment $\mathrm{LPM}_{m}(n)$ of the market $m$ and the co-lower partial moment $\mathrm{CLPM}_{mi}(n)$ between the asset $i$ and the market $m$, where $\tau$ is equal to the risk-free rate. In contrast, Zabarankin et al (2014) formulate a drawdown-based asset pricing model, where $\beta^{\mathrm{MDD}}$ depends on the (negative) maximum drawdown $\mathrm{MDD}_{m}$ of the market $m$ and the corresponding cumulative uncompounded return $\mathrm{CCUR}_{i}$ of asset $i$, which was realized in the period the market experienced its MDD.88 8 This beta can be generalized by using the average of the $\kappa$% largest market drawdowns in the denominator and the average of the corresponding CCURs of asset $i$ in the numerator (see Zabarankin et al 2014).

2.1.3 Measures of unsystematic risk

Finally, metrics to capture unsystematic risk form the smallest group of risk measures. The standard measure for the unsystematic risk associated with an asset is its idiosyncratic volatility. It is quantified by the SD of the residuals $\varepsilon_{i}$ resulting from the estimation of simple regression models. While the market model (MM) has been used by Bali et al (2011) and Baker et al (2014), the low-risk anomaly studies of Ang et al (2006b, 2009) and Li et al (2014, 2016) employ the Fama and French (1993) three-factor model (FFM). The Carhart (1997) four-factor model (CHM) has not been used in a low-risk context so far.

2.2 Risk measure parameterization

In empirical applications, the risk measures of Section 2.1 are usually calculated based on a one-year window of daily returns. This is because the accuracy of risk measure estimates improves with the sample frequency (see Merton 1980) and because a one-year window captures variation in risk over time (see Patton and Timmermann 2010) while guaranteeing a lower portfolio turnover than with windows of, for example, monthly size (see Baker et al 2014).

The parameters of the risk measures are also set quite similarly across different studies. Starting with the metrics of total risk, the probability $\alpha$ for the VaR measures is usually set to 0.05 (see Gilli and Këllezi 2006). The proportion $q$ determining the threshold value in the EVaR approach is 0.10 because it is considered a minimum requirement for reliable estimation results (see McNeil and Frey 2000). Finally, the number of significant continuous drawdowns for the ADD and the DDD is chosen to be $K=5$ (see Eling 2008), and the minimal acceptable return $\tau$ for the LPM is zero because most studies work with excess returns (see Schuhmacher and Eling 2012). As far as the measures of systematic risk are concerned, betas are typically computed with respect to the CRSP value-weighted market index (see Baker et al 2011, 2014; Li et al 2014, 2016). Finally, to calculate idiosyncratic volatilities, we require a proxy for the risk-free rate (the US Treasury bill rate), the market risk premium and also the returns of the Fama and French (1993) and Carhart (1997) factor-mimicking portfolios (see Ang et al 2009). They are obtained from Kenneth French’s online data library.99 9 URL: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html. Note that this data is also used in our portfolio excess return and alpha calculations

2.3 Equity data sources

Our empirical analysis focuses on US stocks. Specifically, we follow Frazzini and Pedersen (2014) by extracting all available common stocks on the CRSP daily US stock database (database property “shrcd” is equal to 10 or 11) between January 1926 and December 2014. This results in a sample size of 24 126 relevant stocks. The returns for these stocks are given in US dollars (including dividend payments). Excess returns are above the US Treasury bill rate. In accordance with Bali et al (2011), the size, book-to-market, momentum and liquidity variables, which are relevant for conducting firm-level regressions, are calculated using the standard definitions of the literature and using data from CRSP/Compustat and Thomson Reuters Datastream.

3 Checking for anomaly existence

3.1 Quantile portfolios

We start our empirical analysis by investigating whether our universe of CRSP stocks shows evidence of an inverse risk–return relationship for all our risk measures. To this end, we construct equal-weighted quantile portfolios derived from univariate risk sorts because they are often used to detect the first signs of anomalous risk effects in the cross section of stock returns (see, for example, Baker et al 2011; Dutt and Humphery-Jenner 2013). For each risk measure and at the end of each month, we sort the stocks into six quantile portfolios based on their risk realizations in that month.1010 10 We use six portfolios for better visualization of our tables in the online appendix. Similar results can be obtained for decile portfolios. These portfolios are held for one month and are then rebalanced to maintain equal weights. The means, SDs and Sharpe ratios for the monthly excess returns of these quantile portfolios from January 1931 to December 2014 are illustrated in Figure 1, where P1 (P6) denotes the portfolio of stocks with the lowest (highest) risk. The specific design of the figure is close to typical box–whisker plots. Using the subplot for the Sharpe ratio as our example, for each quantile, it draws a bullet presenting the average of the portfolio Sharpe ratios over all risk measures. This information is supplemented by the corresponding bandwidths of Sharpe ratios indicating the highest and lowest Sharpe ratios entering the averages.

Our results indicate the existence of 25 low-risk anomalies, one for each risk measure. The question of whether these 25 effects are independent or whether they actually reflect just one anomaly is analyzed in Section 4. For now, we state the following observations. For each risk measure, the mean excess returns (SDs) are almost monotonically declining (increasing) when moving from the low-risk to the high-risk quantile.1111 11 An application of the Patton and Timmermann (2010) monotonicity test shows that the decline in means is both economically and statistically significant. As a result, the Sharpe ratios show a similar decline. This finding has three important implications. First, there are anomalies for all risk categories: systematic risk (betas), idiosyncratic risk (idiosyncratic volatilities) and total risk (the remaining measures). Second, more complex risk measures do not appear to offer substantial benefits in terms of a solution to the low-risk puzzle. Third, and closely related to this observation, measures extracting only a small amount of information from the historical return distribution generate anomalous effects as well. For example, our findings hold even for the simplest of all risk measures, the MIN. Thus, unlike the more complex measures, only one return is required to document an anomaly. Further, the choice of return model used to estimate idiosyncratic volatilities has no crucial influence. Again, the simplest model (the market model) would be sufficient to observe an anomaly.1212 12 In contrast to Bali and Cakici (2008), who find no evidence of an idiosyncratic volatility effect in equally weighted portfolios, we can detect the low-risk anomaly in such portfolios. This may be partially because we focus on a larger time series sample (1926–2014 versus 1958–2004). A similar argument can be made for the order of the LPMs because the orders 1, 2 and 3 produce similar results. Finally, a comparison between traditional VaR and EVaR shows that the more advanced peak-over-threshold approach generates a downward slope that is not too distinct from HS. Again, a less complex measure could be favored.

3.2 Arbitrage portfolios

Besides documenting the existence of an inverse risk–return relationship for each of our risk measures via quantile portfolios, it is instructive to look at the performance of arbitrage portfolios designed to exploit the return difference between low- and high-risk stocks. This is because such portfolios are valuable for the construction of investment funds, and they can serve as explanatory variables in multifactor asset pricing models (see Frazzini et al 2013; Auer and Schuhmacher 2015).1313 13 Our full set of arbitrage portfolio returns is available from the authors upon request. More importantly, they also allow a direct comparison of low-risk anomaly magnitudes across our different risk measures, and they serve as the basis of our spanning tests in Section 4.2.3.

We construct a special case of the Frazzini and Pedersen (2014) BAB factor, which can be made applicable to each of our risk measures. Specifically, we build our betting-against-risk (BAR) arbitrage portfolios as follows. For each risk measure, all stocks are ranked in ascending order on the basis of their risk realization. The ranked stocks are assigned to either a low-risk or a high-risk subportfolio. The low-risk (high-risk) subportfolio is composed of all stocks with a risk realization below (above) the median risk. In each subportfolio, stocks are weighted based on their ranked risk realizations: that is, lower-risk stocks have larger weights in the low-risk portfolio and higher-risk stocks have higher weights in the high-risk portfolio.1414 14 Note that the weights are derived from ranks and not directly from risk realizations (see Asness et al 2014). In this way, we can avoid the distorting effects of outliers. The subportfolios are rebalanced every calendar month. The arbitrage portfolio is then obtained as a self-financing portfolio that is long in the low-risk subportfolio and short in the high-risk subportfolio. In contrast to Frazzini and Pedersen (2014), our BAR subportfolios are not rescaled to the risk of the market because this is not reasonable for some of our risk measures.1515 15 The BAB factor can be generalized to $\smash{r^{\mathrm{BAR}}_{t+1}}=\smash{\lambda^{\mathrm{M}}_{t}}\smash{(\lambda% _{t}^{\mathrm{L}})^{-1}}(\smash{r_{t+1}^{\mathrm{L}}}-\smash{r_{\mathrm{f}}})-% \smash{\lambda^{\mathrm{M}}_{t}}\smash{(\lambda_{t}^{\mathrm{H}})^{-1}}(\smash% {r_{t+1}^{\mathrm{H}}}-\smash{r_{\mathrm{f}}})$, where $\smash{r_{t+1}^{\mathrm{L}}}=\smash{r^{\prime}_{t+1}w^{\mathrm{L}}}$ and $\smash{r_{t+1}^{\mathrm{H}}}=\smash{r^{\prime}_{t+1}w^{\mathrm{H}}}$ are the returns of the low- and high-risk portfolios, and $\smash{\lambda_{t}^{\mathrm{L}}}$ and $\smash{\lambda_{t}^{\mathrm{H}}}$ are the corresponding portfolio risks. $\smash{\lambda^{\mathrm{M}}_{t}}$ is the risk of the market (the CRSP value-weighted market index). Thus, the long side is leveraged to the risk of the market and the short side is deleveraged to the risk of the market. Depending on the risk measure, portfolio risks can be estimated either via $\smash{\lambda_{t}^{\mathrm{L}}}=\smash{\lambda^{\prime}_{t}w^{\mathrm{L}}}$ and $\smash{\lambda_{t}^{\mathrm{H}}}=\smash{\lambda^{\prime}_{t}w^{\mathrm{H}}}$ or via using the constituents at time $t$ to calculate the daily portfolio returns in the preceding year and estimating the subportfolio risks based on these returns. For some of our risk measures, $\smash{\lambda^{\mathrm{M}}_{t}}$ can be obtained using a rolling one-year window of daily data; the scaling is unreasonable for idiosyncratic volatility because the market portfolio is fully diversified.

Figure 2 provides an overview of the Sharpe ratios and the Fama–French–Carhart (four-factor) alphas for the BAR portfolios using our entire set of CRSP stocks.1616 16 Note that an application of the Fama and French (2015) five-factor model and the Hou et al (2015) $q$-factor model delivers similar results on alpha significance. This is in line with the evidence on risk-sorted portfolios presented by Hou et al (2017). We focus on the four-factor model because its variables are still dominant in current research (see Cakici and Tan 2014; Cakici et al 2016). It also displays results for a reduced subset of stocks, which we will discuss shortly. With respect to the CRSP set, we find that, with monthly values of around 0.20, the Sharpe ratios of all BAR portfolios are quite similar in magnitude. An application of the Ledoit and Wolf (2008) test, which is a bootstrap-based generalization of the procedure in Jobson and Korkie (1981), reveals no statistically significant differences at conventional significance levels.1717 17 We use its standard setting with 5000 bootstrap repetitions and automatic block length selection from the predetermined candidate sizes 1, 3, 6, 10 and 15. The alphas are highly significant (both statistically and economically) for all risk measures. They show values of around 1.50% (total risk and idiosyncratic volatility) and 1.00% (betas) per month, indicating that size, book-to-market and momentum factors cannot fully explain the BAR returns.1818 18 Well-known limitations of the classic size factor contribute to this result (see De Moor and Sercu 2013; Asness et al 2015).

To explore whether the detected alpha performance differences between betas and the other risk measures are systematic, Figure 3 reports the four-factor alphas of the arbitrage portfolios in five subsamples used by Frazzini and Pedersen (2014), ie, 1931–50, 1951–70, 1971–90, 1991–2010 and 2011–14. We observe that, in the first two subperiods, the performance of the BAR portfolios is rather low and similar across all risk measures. In contrast, the more recent subperiods display significantly higher performance of the arbitrage strategies, as well as noteworthy differences between the performance linked to measures of total risk and the remaining measures. Especially in the last subperiod, the textbook definition of beta yields the weakest anomaly strength. These features can be explained as follows. First, recent decades have been characterized by significant advances in trading technology and low transaction costs, which stimulate arbitrage activity (see French 2008; Hendershott et al 2011; Chordia et al 2014). Consequently, the returns of the benchmark factors have declined over time, making it easier for a new strategy to outperform them. Second, in contrast to the other risk measures, $\smash{\beta^{\mathrm{TB}}}$ has received more public attention in a low-risk anomaly context, such that arbitrage forces are naturally stronger for this measure than for others (see McLean and Pontiff 2016; Jacobs and Müller 2017).

3.3 Liquidity considerations

Leading studies on the low-risk anomaly are based on the entire CRSP data set (see Baker et al 2014; Frazzini and Pedersen 2014), which also naturally includes less liquid stocks. However, some studies have argued that the anomaly tends to be concentrated among illiquid stocks and becomes practically irrelevant after their exclusion (see, for example, Li et al 2014). In this last section of our low-risk anomaly documentation, we show that our 25 low-risk anomalies and their similar magnitudes also occur in universes containing many (or exclusively) liquid stocks. We start with a simple but powerful illustrative example. That is, we obtain BAR portfolios for the 30 stocks of the DJIA index, which represent about 25% of the market capitalization of all New York Stock Exchange (NYSE) stocks and are additionally characterized by high media coverage and inclusion in a wide variety of international investment products (see Brock et al 1992).1919 19 To avoid survivorship bias, we focus not on the current constituents but rather on the historic constituents, and we consider them only in the phases in which they were included in the index (see Taylor 2014). Figure 2, reporting the Sharpe ratios and four-factor alphas of these portfolios, shows that when concentrating on these highly liquid stocks the anomalies weaken but are still present at economically relevant levels. For example, alphas amount to around 0.60% per month.2020 20 On an annual basis this yields about 7.20% (in excess of the market and popular investment strategies), which is quite impressive for self-financing portfolios based on only a handful of stocks. In a comparison with the CRSP results, we can observe that the low-risk anomalies are certainly strong among smaller, illiquid stocks but also that they are not confined to this market segment. The anomalies exists even in a setting where the conditions for efficient arbitrage are ideal (see Chordia et al 2008).

In addition to this exercise, we perform some other calculations. First, we apply a filter similar to those adopted by Lo et al (2000) and Dutt and Humphery-Jenner (2013). That is, we exclude stocks with insufficient trading activity as measured by the illiquidity indicator of Lesmond et al (1999). Specifically, we calculate the ratio of zero returns to the total number of returns for each stock, and we eliminate those over a value of 0.25 at portfolio formation.2121 21 For the CRSP database, this filter roughly corresponds to a focus on the 50% of stocks with the highest market value (see Lesmond et al 1999). This way we can consider that the liquidity of stocks changes over time (see Pástor and Stambaugh 2003) because we exclude stocks only in phases of pronounced illiquidity.2222 22 Similarly, we consider other exclusion methods. For example, we eliminate stocks with prices below USD5 per share (see Jegadeesh and Titman 2001; Li et al 2014), a size that would place them in a micro-cap quantile (see Jegadeesh and Titman 2001; Fama and French 2008) or an Amihud (2002) illiquidity measure, calculated following Bali et al (2011), below a critical level, ensuring the same number of stocks as in our Lesmond et al (1999) selection. Further, we alternatively concentrate on the 1000 stocks with highest market capitalization (see Baker et al 2011). Second, while our previous methods reduce the number of stocks, a last variant uses the entire CRSP database but adds the Pástor and Stambaugh (2003) liquidity factor to alpha regressions.2424 24 URL: https://faculty.chicagobooth.edu/lubos.pastor/research. In each case, our main conclusions hold.2323 23 The detailed results of these computations are available from the authors upon request.

4 Exploring anomaly linkages

So far, we have observed 25 low-risk anomalies with similar risk-adjusted performance of the corresponding arbitrage portfolios. Such a result could be caused by the presumption that the anomaly originates from the textbook beta (the risk measure for which the anomaly has been documented most extensively) and that all ARMs simply approximate this dominant effect. However, in the light of the lower performance of beta arbitrage portfolios, we could also argue the reverse: the low-risk anomaly might originate from one of the ARMs and be approximated by the textbook beta (and the remaining measures). The latter presumption is reasonable because beta measures systematic risk, which is a component of total risk (see Ben-Horim and Levy 1980). To analyze such potential linkages between observed market effects, the literature uses bivariate portfolio sorts, firm-level cross-sectional regressions and spanning tests. We implement these methods below, after a preliminary analysis of the rank correlations among our risk measures
.

4.1 Rank correlations

Figure 4 presents some properties of the pairwise rank correlations between our conceptually different risk measures. That is, for each month in our sample period, we calculate Spearman rank correlations and report their time series averages and SDs in the form of a heatmap.2525 25 Some examples illustrating the evolution of the correlation over time are given in Section A of our online appendix. As we can see, rank correlations between and among measures of total risk and idiosyncratic volatility are very high. Rank correlations between these measures and different forms of betas are lower but still positive and economically relevant. As far as the time variation of rank correlations is concerned, the SDs are rather low but they show some higher values when betas are involved.

Zakamouline (2011) points out that, especially in large cross-sectional samples, there can be significant differences in ranks, which are concealed by rather high correlation coefficients. Thus, as we will see in Section 4.2, a high rank correlation between different risk measures should not deceive us into believing that they all contain the same investment-relevant information and that none has an advantage over the others.

4.2 Anomaly-separating methodologies

4.2.1 Bivariate portfolio sorts

Among our anomaly-separating techniques we first look at equal-weighted quantile portfolios from bivariate risk sorts, because multivariate portfolio sorts are considered to be the simplest way to analyze whether one cross-sectional effect subsumes the other or whether two cross-sectional effects are separate phenomenons (see, for example, Chan et al 1996; Bali et al 2011; Novy-Marx 2015). We use two kinds of sorting. In the first one, we sort on each ARM while controlling for $\smash{\beta^{\mathrm{TB}}}$. To this end, we form quantile portfolios ranked based on $\smash{\beta^{\mathrm{TB}}}$, and within each $\smash{\beta^{\mathrm{TB}}}$ quantile, we sort stocks into quantile portfolios based on a given ARM such that quantile 1 (quantile 6) contains stocks with the lowest (highest) ARM levels. This produces 36 (6 $\times$ 6) portfolios for each risk measure. Following Bali et al (2011), we then calculate the average excess returns across the $\smash{\beta^{\mathrm{TB}}}$ control quantiles to create quantile portfolios with dispersion in a given ARM, but which contain all levels of $\smash{\beta^{\mathrm{TB}}}$. In other words, for each risk measure, we obtain six portfolios P1,…, P6 that differ in their ARM levels but have approximately the same levels of $\smash{\beta^{\mathrm{TB}}}$. The second kind of bivariate sort we perform is a reverse sort, ie, we sort by the $\smash{\beta^{\mathrm{TB}}}$ while controlling for an ARM. Thus, we first form quantile portfolios ranked based on the ARM. Then, within each quantile, we sort stocks into quantile portfolios ranked based on $\smash{\beta^{\mathrm{TB}}}$ so that quantile 1 (quantile 6) contains the stocks with the lowest (highest) $\smash{\beta^{\mathrm{TB}}}$. Again, we average across the control quantiles to generate portfolios P1,…, P6, with dispersion in $\smash{\beta^{\mathrm{TB}}}$ but with similar ARM levels.

Table 2 presents some properties of our portfolios sorted by ARM ($\smash{\beta^{\mathrm{TB}}}$) after controlling for $\smash{\beta^{\mathrm{TB}}}$ (ARM). We report their monthly Sharpe ratios and a summary measure that allows quick judgment of whether the low-risk anomaly occurs in the bivariate sorts. Specifically, we construct a long–short portfolio based on the lowest and highest quantiles, and we calculate the corresponding test statistic of the Bailey and López de Prado (2012) test, which is a nonnormal generalization of the well-known Lo (2002) test for evaluating the statistical significance of Sharpe ratios.2626 26 This test uses the test statistic $\widehat{\mathrm{SR}}/\hat{\sigma}_{\widehat{\mathrm{SR}}}$, where $\widehat{\mathrm{SR}}$ denotes the estimated Sharpe ratio, and its standard error is denoted by $\smash{\hat{\sigma}_{\widehat{\mathrm{SR}}}}=\smash{[(T-1)^{-1}(1-\hat{\gamma}% \widehat{\mathrm{SR}}+\frac{1}{4}(\hat{\kappa}-1)\widehat{\mathrm{SR}}^{2})]^{% 0.5}}$. $\hat{\gamma}$ and $\hat{\kappa}$ denote return skewness and kurtosis, respectively. This test statistic is normally distributed even if the returns are not. Interestingly, for both types of portfolios, we can observe falling Sharpe ratios as we move from P1 to P6. Further, in many cases, we detect significant (positive) Sharpe ratios for the spread portfolios in both control directions. Results of this kind from double sorts are typically interpreted as evidence that one cross-sectional effect does not approximate the other or that both effects are separate phenomenons (see Chan et al 1996; Bali et al 2011; Novy-Marx 2015). In our case, the $\smash{\beta^{\mathrm{TB}}}$ effect and many of the ARM effects would be seen as (pairwise) separate effects.

Besides looking at $\smash{\beta^{\mathrm{TB}}}$, we perform additional bivariate sorts where we replace $\smash{\beta^{\mathrm{TB}}}$ in the two sorts described above by each of the ARMs. In other words, we control each risk measure for each other risk measure. Figure 5 visualizes the results of these calculations. As in Table 2, we calculate a Bailey and López de Prado (2012) test statistic for every possible sort. For a better interpretation of the presented heatmap, note that the values in column 17 (row 17), ie, for $\smash{\beta^{\mathrm{TB}}}$, are identical to those in part (a) (part (b)) of Table 2. Looking at the entire picture and following the classic interpretation, we can say that many risk measures appear to represent separate cross-sectional effects. While such a result seems reasonable when differentiating measures of total, systematic and unsystematic risk, it is more surprising when, for example, controlling a measure of total risk against another measure of total risk. This is because we might expect one risk measure of this category to be as good as the other because they tend to be highly correlated (see Pfingsten et al 2004; Eling and Schuhmacher 2007; Bali et al 2011). For example, $\mathrm{VaR}^{\mathrm{HS}}$ and $\mathrm{VaR}^{\mathrm{ND}}$ are highly correlated (see Figure 4), but the statistics at the coordinates (7,8) and (8,7) in the heatmap matrix are highly significant. Thus, despite their high correlation, each appears to contain distinctive information for investors.

4.2.2 Cross-sectional regressions

After presenting our first evidence from bivariate portfolio sorts, we now leave the portfolio level and conduct firm-level cross-sectional regressions, as suggested by Fama and MacBeth (1973), which represent another popular approach to distinguish between cross-sectional effects. While the use of portfolio sorts in the empirical analysis of cross-sectional effects has the advantage of being nonparametric in the sense that it does not impose a functional form on the relationship between a predictor variable and future returns, it has three potentially significant disadvantages (see Bali et al 2011). First, it eliminates a large amount of information in the cross section via aggregation. Second, it is a difficult setting in which to simultaneously control for multiple factors. Third, dependent bivariate sorts on correlated variables may not sufficiently control for a control variable.2727 27 For example, there could be some residual variation in $\smash{\beta^{\mathrm{TB}}}$ across the ARM portfolios in Table 2(a). Therefore, empirical studies typically supplement their calculations by implementing independent bivariate sorts or firm-level Fama and MacBeth (1973) regressions (see Gebhardt et al 2005). Because, in our application, the independent sorts generate very similar results to the dependent sorts (as in, for example, Diether et al, 2002), this section focuses on cross-sectional regressions.

We use two regression settings. Specifically, we estimate initial regressions with only one risk variable (single-risk setting), and we then consider extended versions of these regressions where we add a second risk variable as a control variable (multirisk setting). By comparing the results of both regressions, we can judge whether one cross-sectional effect subsumes the other. For example, we estimate single-risk regressions for SD and $\smash{\beta^{\mathrm{TB}}}$. Then, we conduct a multirisk regression with both SD and $\smash{\beta^{\mathrm{TB}}}$ as explanatory variables. This way we control SD for $\smash{\beta^{\mathrm{TB}}}$ and $\smash{\beta^{\mathrm{TB}}}$ for SD at the same time. Should SD ($\smash{\beta^{\mathrm{TB}}}$) turn insignificant in comparison with the single-risk setting then $\smash{\beta^{\mathrm{TB}}}$ (SD) would be the dominant effect, which is the typical interpretation of such controls involving potentially related effects (see Novy-Marx 2015). Similarly, Chordia et al (2014, p. 46) states: “the significance of an anomaly in the presence of others indicates that each anomaly exerts an independent and significant influence on returns”.

We start with the single-risk regressions. For each risk measure and each month, we run a cross-sectional regression of stock returns in that month on the risk measure realizations in the previous month (and on the traditional features, ie, size, book-to-market, momentum and liquidity, as in Bali et al (2011) and de Groot and Huij (2018)). We then calculate the time series averages of the risk-related cross-sectional slope coefficients, and we evaluate their significance. In the literature, it has become standard to perform this evaluation using $t$-statistics based on robust Newey and West (1987) standard errors (see, for example, Chordia et al 2009; Chui et al 2010; Loughran and McDonald 2011; Chen et al 2013; Bartram et al 2015; Choi and Choi 2018) even though measurement error in the explanatory variables can bias parameter estimates and standard errors. This course of action can be traced back to the fact that suitable methods to address this problem are still under active research (see Chordia et al 2017; Jegadeesh et al 2019). We do not wish to enter this debate on methodology refinement but instead we follow the established standard approach.2828 28 We also followed Gebhardt et al (2005) by implementing the approach of Brennan et al (1998) that mitigates the problem of using estimated quantities. However, this did not crucially influence our main results.

Table 3(a) reports the risk-related slope coefficients and $t$-statistics from our single-risk regressions. Supporting Baker et al (2011) and Frazzini and Pedersen (2014), the regressions provide evidence of a highly significant negative relationship between betas and future stock returns. Further, in line with our results from the univariate portfolio sorts, the negative relationship between risk and return can also be observed for all other risk measures. As far as the absolute magnitude of the coefficient estimates is concerned, we find higher values for measures of total and unsystematic risk than for betas, which (apart from their signs) partially supports the theory of Merton (1987) and our earlier discussion on beta-related arbitrage.

The second regression setting extends the single-risk regressions to multirisk regressions by adding the lagged $\smash{\beta^{\mathrm{TB}}}$ as an additional explanatory variable. This approach resembles the more sophisticated counterpart to the multivariate portfolio sorts of Table 2, and it is a standard way of controlling existing cross-sectional models for variables that may provide superior explanatory power (see Tetlock 2010; Bali et al 2011; Ortiz-Molina and Phillips 2014). Note that potential multicollinearity (because of correlated risk measures) has no distorting effect on our regression results for three reasons. First, it does not generally give a biased estimation of regression coefficients but results in larger variances of the least-squares estimator (Harvey 1977). As we will see, the overall picture in our application is significant.2929 29 This is why researchers sometimes include correlated variables without a discussion of potential problems (see, for example, Brennan et al 1998; Hou and Moskowitz 2005). Second, we followed Bali et al (2011) by repeating our estimations with orthogonalized control risk measures.3030 30 For a critical discussion of this approach, see Kennedy (1982). As expected, this did not influence our main conclusions. Finally, as pointed out in Section 4.1, high rank correlation does not necessarily imply that two variables have strongly similar information content.

Table 3(b) presents the results for the multi-risk model. The coefficients of all ARMs remain negative and significant after the inclusion of the lagged $\smash{\beta^{\mathrm{TB}}}$. In other words, we can say that the coefficient of $\smash{\beta^{\mathrm{TB}}}$ keeps its negative sign and, in many cases, it is still significant after an inclusion of a lagged ARM.3131 31 Bali et al (2011) find a positive coefficient on beta in some of their analogously augmented regressions. However, they point out that their results “should be interpreted with caution since beta is estimated over a month using daily data, and thus, is subject to a significant amount of measurement error” (Bali et al, 2011, p. 437). The typical interpretation of this kind of result would be that the risk measures (eg, EVaR and $\smash{\beta^{\mathrm{TB}}}$) describe separate effects, and more than one measure of risk should thus be used in the cross-sectional asset pricing equation. Likewise, it advises against dropping beta from cross-sectional pricing equations and relying solely on idiosyncratic volatility, as in Jiang et al (2009).

In Figure 6, we see all possible pairs of risk measures and report their $t$-statistics (and significances) in multirisk regressions.3232 32 Similarly to Section 4.2.1, the heatmap column 17 and row 17 reflect the results in Table 3. Although the number of pairwise significances is lower than those of the bivariate sorts summarized in Figure 5, it is still considerable. We find many instances calling for more than one risk variable in the cross section of stock returns.3333 33 Harvey et al (2016) argue that, given the extensive number of factors attempting to explain the cross section of stock returns and the related data mining problems, using the usual $t$-statistic cutoff of 2.0 for establishing statistical significance may be inappropriate. They introduce a new multiple testing framework, and they provide the important result that a new factor must clear a much higher hurdle, with a $t$-statistic greater than 3.0. In our application, such a decision rule does not lead to crucial changes. Using the portfolio interpretation of cross-sectional regressions,3434 34 Full details on this interpretation are given in Section B of our online appendix. we can also say that relying on additional risk measures in stock selection strategies increases mean excess returns because they can add new information not already contained in the risk measure (and other selection criteria) currently in use.

4.2.3 Spanning tests

Novy-Marx (2015) warns that multivariate portfolio sorts are far too coarse and suggests time series regressions, or spanning tests, to distinguish between presumably different cross-sectional effects. These tests are more robust to measurement error than cross-sectional regressions, and they do not require parametric assumptions regarding the functional form of the relationship between expected returns and the predictive variables. To conduct a spanning test we have to regress the returns of an arbitrage portfolio for one cross-sectional effect on the returns of arbitrage portfolios built based on other effects of interest. Using these regressions, we can evaluate whether the left-hand effect (or test strategy) generates significant alpha relative to the right-hand effects (or explanatory strategies). Significant (insignificant) alphas suggest that an investor already trading the explanatory strategies can realize significant (little) gains by starting to trade the test strategy.

In our application, we perform two kinds of spanning tests where we use the BAR arbitrage portfolios constructed in Section 3.2. In the first one, we regress the BAR portfolio returns for each ARM on the BAR portfolio returns for $\smash{\beta^{\mathrm{TB}}}$ (and factor portfolios for size, book-to-market, momentum and liquidity, as in Frazzini and Pedersen (2014) and Novy-Marx (2015)).3535 35 Note that, in comparison with our previous calculations, this approach reduces the sample size because the Pástor and Stambaugh (2003) liquidity factor was not available before 1968. In the second kind of spanning test, we mirror the position of the risk variables. That is, we regress the portfolio returns for $\smash{\beta^{\mathrm{TB}}}$ on the portfolio returns for each ARM (and on the other factor portfolios).

Table 4 reports the alphas and the corresponding Newey and West (1987) $t$-statistics of these spanning tests. With few exceptions, the tests yield significant alphas in both test directions. This suggests that investors can benefit from supplementing a trading strategy based on $\smash{\beta^{\mathrm{TB}}}$ (an ARM) and other well-known market anomalies, by a strategy based on an ARM ($\smash{\beta^{\mathrm{TB}}}$).

To cover all risk measures, the entries in the heatmaps of Figure 7 present the alpha $t$-statistics of similar regressions of the row risk measure portfolio on the column risk measure portfolio. Again, we find many instances of independent effects.3636 36 This also holds when regressing only BAR portfolios on each other, ie, when excluding the factor portfolios for size, book-to-market, momentum and liquidity (similar to Novy-Marx 2012). The spanning tests confirm our previous overall findings. Further, we have many cases in which our three anomaly-separating methodologies reach the same outcome. For example, all methodologies agree that investors trading based on SD (and other factors) can enhance their portfolio performance by additionally trading based on MDD and vice versa. However, we also see that the choice of anomaly-separating methodology can influence the results for specific pairs of risk measures. For example, while we detect separate effects for IVOL (FFM) and IVOL (CHM) in the portfolio sorts and cross-sectional regressions, this is no longer true in the spanning tests. Given that many studies in the finance literature rely almost exclusively on the two former methods, such observations raise similar doubts on the robustness of their results as classic data mining criticism (see Lewellen et al 2010; Harvey et al 2016).

5 Related cross-sectional effects and robustness

In this section we analyze whether the information content of our risk measures is similar to the variables discussed in recent studies on investment risk and the cross section of stock returns. In other words, we investigate whether our measures are proxies for these variables or vice versa. Further, we outline the robustness of our results in different research settings.

5.1 Max and skewness

5.1.1 The max effect

Motivated by the empirical evidence that certain groups of individual investors have a preference for lottery-like stocks (see Kumar 2009), Bali et al (2011) examine the role of extreme positive returns in the cross-sectional pricing of US stocks, and they find a significantly positive return difference between stock portfolios containing stocks with low and high maximum daily returns. Bali et al (2011) argue that this evidence (and the persistence of the maximum return) suggests that investors are willing to pay more for stocks that have a small probability of a large positive return, which is consistent with the cumulative prospect theory as modeled in Barberis and Huang (2008) and with the optimal beliefs framework of Brunnermeier et al (2007).

Even though the maximum return is not a risk measure in the classic sense, some of our risk measures, especially the symmetric ones, may be linked to it. For example, stocks with high maximum returns are also likely to exhibit high volatility, measured using squared daily returns, almost by construction. Thus, it is plausible that some of our risk measures simply capture the max effect. To examine this issue, we add the maximum daily return of the previous year to our selection of risk measures, and we repeat our analysis.3737 37 We also used the average of the two, three, four or five highest returns instead of the single maximum daily return. However, the results are similar.

Interestingly, we find that the stock rankings produced by the maximum return are highly correlated to our measures of total risk. Consequently, using the maximum return in the construction of a BAR portfolio, we obtain a highly significant monthly Sharpe ratio of 0.21 and a four-factor alpha of 1.37%, which is close to the values for our risk measures presented in Figure 2. Controlling for the maximum return provides a similar picture as the control exercises for our other risk measures. That is, for example, the $t$-statistics of the coefficients of the maximum return in multirisk cross-sectional regressions range from $-$9.34 to $-$2.52, while the $t$-statistics for the risk measures vary from $-$7.79 to $-$3.40. This is in line with the results of Bali et al (2011) for the relationship between max effect, beta and idiosyncratic volatility. With an average alpha $t$-value of 2.09 (across all risk measures), spanning tests confirm the significance of separate information contained in the maximum return.

5.1.2 The skewness effect

After controlling for the max effect, we analyze a potential link to the skewness of stock returns. Within a three-moment (mean, variance and skewness) asset pricing framework, Arditti (1967), Kraus and Litzenberger (1976) and Kane (1982) show that investors have a preference for positive skewness. As a result, assets that decrease (increase) a portfolio’s skewness are less (more) desirable and should command a higher (lower) expected return. Harvey and Siddique (2000) and Smith (2007) provide empirical evidence for the central prediction of these models.3838 38 Friend and Westerfield (1980), Sears and Wei (1985) and Barone-Adesi (1985) also cover the role of skewness in empirical asset pricing. They show that systematic skewness, not idiosyncratic skewness, contributes to explaining the cross-sectional variation of stock returns. Specifically, stocks with lower systematic skewness tend to outperform stocks with higher systematic skewness.

To ascertain whether the information content of our risk measures is similar to skewness, we test the significance of the cross-sectional relationship between our risk measures and future stock returns after controlling for total, systematic and idiosyncratic skewness. To estimate these skewness measures, we use the same data window as for our risk measures, and we follow the skewness definitions of Harvey and Siddique (2000).3939 39 Total skewness (TSKEW) is the natural measure of the third central moment of returns. Systematic skewness (SSKEW), or co-skewness, is the coefficient of a regression of excess returns on squared market excess returns, including the market excess return as a second regressor. Finally, idiosyncratic skewness (ISKEW) is the skewness of the residuals from this regression. In our sample and consistent with Boyer et al (2010) and Conrad et al (2013), we find that more positively skewed stocks tend to have lower returns. That is, the cross-sectional coefficients of the skewness measures in univariate firm-level cross-sectional regressions are negative. However, similarly to the results of Bali et al (2011), who calculate skewness-based measures over one, three, six and twelve month(s), we show that these effects are rather weak from an economic and statistical perspective.4040 40 This difference from the significant result of Boyer et al (2010) presumably stems from methodological differences, because they predict only portfolio returns rather than the returns on individual securities. Nonetheless, we complete our analysis by controlling our risk measures for the different skewness measures. While the $t$-statistics for total skewness range from $-$6.91 to $-$1.36, the risk measures are all significant, with $t$-statistics between $-$7.76 and $-$3.52. Similar results can be obtained for bivariate sorts and spanning tests and for the two other skewness measures.4141 41 Boyer et al (2010) argue that lagged skewness may not be a good predictor for future skewness, because in a rational market it is expected future skewness that matters. We address this potential concern in our analysis as follows. First, we estimate cross-sectional regressions of skewness on lagged skewness. Further, we extend this model by using the classic control variables: beta, size, book-to-market, momentum and liquidity. We find that skewness is highly persistent, in both the univariate and multivariate contexts. Second, we use the fitted values from the month-by-month cross-sectional regressions as a measure of expected skewness, as in Boyer et al (2010). However, our conclusions remain unchanged. There is no evidence that the low-risk anomaly we observe for different risk measures is eliminated by adding measures of skewness to the cross-sectional pricing equation.4242 42 This may change in a setting using intraday data (see Amaya et al 2015). In other words, the anomaly is not the result of an omitted variable bias related to skewness.

5.2 Sensitivity checks

To ensure that our results are not driven by any particular setting in our research design, we conduct a variety of supplementary calculations that verify the robustness of our results.4343 43 As these robustness checks provide results similar to Sections 3 and 4, we only summarize their basic design. Detailed results are available from the authors upon request.

First, we repeat our bivariate portfolio sorts, cross-sectional regressions and spanning tests for the subsamples introduced in Section 3.2.4444 44 Because readers are often interested in whether findings change in periods of market stress, deviating from the rule of the previous footnote, Section A of our online appendix reports the spanning test results in merged subsamples with and without the dot-com crash and the global financial crisis. Second, as return outliers can drastically influence regression-based (see Knez and Ready 1997; Martin and Simin 2003) and other risk measures (see Cont et al 2010), we winsorize the returns at the 0.5% and 99.5% levels; that is, the smallest and largest 0.5% of the returns are set equal to the 0.5 and 99.5 percentiles, respectively.4545 45 Fama and French (1992) use a similar procedure for extreme observations in book-to-market ratios. Third, as far as the specification of our risk measures is concerned, we modify the length of the time window used for their calculation. In the literature, we find several possibilities. For example, Dutt and Humphery-Jenner (2013) show that the low-risk anomaly is qualitatively robust to calculating moving SDs based on 90-, 180-, 250-, 500- and 1000-day horizons. Ang et al (2006b, 2009) obtain idiosyncratic volatility over a short period of 30 days. In contrast, Li et al (2014) use 1, 36 and 60 month(s) and find a strong similarity for the different frequencies. Baker et al (2011, 2014) do not use daily data but resort to monthly data for the previous 60 months to estimate betas. Fourth, we change some risk measure parameters. We obtain our VaR-based risk measures for alternative values of $\alpha$, like 1% and 10%, that are also frequently used in practice. Further, we vary the number $K$ of relevant drawdowns for the ADD and the DDD from 1 to 10 in steps of 1. We also use different values from 2.5% to 15% in steps of 2.5% for the threshold-determining ratio $q$ in EVaR.

Finally, to relax our focus on equally weighted quantile portfolios and risk-weighted BAR portfolios, we consider several additional portfolio types. In other words, we use value-weighted quantile portfolios and build simpler zero-cost arbitrage portfolios as the highest minus the lowest quantile. In addition, we construct our arbitrage portfolios as zero-investment factor-mimicking portfolios following Fama and French (1993), Daniel and Titman (1997) and Li et al (2016).4646 46 At the end of each month, we sort stocks into size terciles, using NYSE breakpoints, and then further sort each size tercile into terciles based on the risk measures. We obtain value-weighted monthly returns on a total of nine portfolios: three size portfolios for each of the three portfolios based on the risk characteristics. We then equally weight each risk portfolio across the size terciles to obtain returns on the three risk portfolios that are size-independent. In order to calculate the returns of the zero-cost portfolio representing the risk-based factor, we subtract the monthly return on the high-risk portfolio from the low-risk portfolio. While, similar to the previous sensitivity checks, our conclusions still hold, the alphas of the alternative arbitrage portfolios are lower. However, this is not surprising because our BAR portfolios are designed to enhance the weight of stocks with very low and very high risk.

6 Conclusion

Motivated by the threat that the low-risk anomaly poses to traditional asset pricing theory, we analyzed whether choosing an adequate risk measure from the wide variety of metrics used by investment professionals can generate a positive risk–return relationship. While answering this research question with a focus on the US stock market and the 25 most popular risk measures, we arrived at several interesting findings.

We documented an anomaly for each risk measure and found that related arbitrage portfolios earn economically and statistically significant multifactor alphas that are very similar across risk measures. Thus, even though some of our risk measures are widely considered to be superior to classic ones, there is no “right” risk measure in our selection that could solve the low-risk puzzle. For example, the puzzle occurs for modern measures based on extreme value theory, and it even occurs for very simple measures using only a single observation of the historic return distribution.

As many of our risk measures are highly correlated, the practical literature would suggest that, in asset selection, the choice of risk measure may be irrelevant. However, in the application of bivariate portfolio sorts, cross-sectional regressions and spanning tests, we showed that, even under such circumstances, most of our 25 anomalies are independent, such that investors trading based on one risk measure can earn significant gains when also trading based on another risk measure. This finding implies that more than one risk variable has predictive power for the cross section of future stock returns, and it calls for an extension of multifactor asset pricing models used in future research. In particular, different measures of total risk, which have not been considered in previous empirical asset pricing studies, may be included, because it is not just systematic risk that appears to be priced. However, it should be kept in mind that, even with one or more total risk factors, the premium on risk still shows up negative.

Finally, we identified an interesting topic for future research: in a comparison of our different anomaly-separating techniques, we saw that, while the overall picture was similar for our application, specific results can be different. That is, depending on the methodology, a researcher might accept or reject a potentially new cross-sectional effect. This reinforces the claims of several recent studies that more research on the adequacy of such methods is needed (see Chordia et al 2017; Pukthuanthong et al 2017; Jegadeesh et al 2019). The low-risk anomaly offers a particularly interesting playground because it can be shown that, under the location-scale models of Meyer (1987) and Meyer and Rasche (1992), the population rankings of all of our risk measures are identical.4747 47 Note that some of the requirements of these models (see Schuhmacher and Auer 2014) are violated by our empirical data. Thus, our results reflect actual differences in information and not just estimation error. This allows the simulation of finite sample data in an environment in which we know that all risk measures reflect the same cross-sectional effect, and consequently puts us in the position to investigate whether the anomaly-separating techniques adequately identify this feature or whether they falsely point toward the existence of separate effects.

Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

Acknowledgements

We thank the participants of the November 2015 CESifo Group Seminar (organized by Hans-Werner Sinn and Volker Meier) and the participants of the 15th Colloquium on Financial Markets (organized by Alexander Kempf) for valuable comments and suggestions. This project has been carried out under research grants from the Fritz Thyssen Stiftung (Az. 20.15.0.079 WW) and Wissenschaftsförderung der Sparkassen-Finanzgruppe e.V. The required data has been supplied by the German Research Foundation through the Collaborative Research Center 649 “Economic Risk”, Humboldt University of Berlin.

References

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact [email protected] to find out more.

More papers in this issue

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

.

Alternatively you can request an individual account here: