# Journal of Risk

**ISSN:**

1465-1211 (print)

1755-2842 (online)

**Editor-in-chief:** Farid AitSahlia

# Covariance estimation for risk-based portfolio optimization: an integrated approach

####
Need to know

- We provide an integrated prediction and optimization (IPO) framework for covariance model parameter estimation.
- We consider minimum-variance, maximum-diversification and equal-risk-contribution portfolio models.
- IPO covariance models are compared to traditional "predict, then optimize" models based on OLS and maximum-likelihood.
- IPO models explicitly minimize decision error and can result in lower out-of-sample nominal costs and improved economic outcomes.

####
Abstract

Many problems in quantitative finance involve both predictive forecasting and decision-based optimization. Traditionally, covariance forecasting models are optimized with unique prediction-based objectives and constraints, and users are therefore unaware of how these predictions would ultimately be used in the context of the models’ final decision-based optimization. We present a stochastic optimization framework for integrating time-varying factor covariance models in a risk-based portfolio optimization setting. We make use of recent advances in neural-network architecture for efficient optimization of batch convex programs. We consider three risk-based portfolio optimizations: minimum variance, maximum diversification and equal risk contribution, and we provide a first-order method for performing the integrated optimization. We present several historical simulations using US industry and stock data, and we demonstrate the benefits of the integrated approach in comparison with the decoupled alternative.

####
Introduction

## 1 Introduction

Many problems in quantitative finance involve both predictive forecasting and decision-based optimization. For example, in portfolio management we typically require estimates of expected return and covariance in order to form optimized portfolios. Traditionally, predictive models for estimating returns and covariances are optimized with unique prediction-based objectives and constraints, and they are therefore unaware of how these predictions would ultimately be used in the context of their final decision-based optimization. A prototypical “predict, then optimize” framework would first fit the predictive models, eg, by maximum likelihood (ML) or least-squares, and then plug in those estimates to the corresponding decision-based optimization program. While it is true that a perfect predictive model would lead to optimal decision-making, in reality, all predictive models do make some error, and thus an inefficiency exists in the “predict, then optimize” paradigm.

With the widespread adoption of machine learning and data science in operations research, there has been a growing body of literature on data-driven decision-making and the relative merits of decoupled versus integrated predictive decision-making (see, for example, Bertsimas and Kallus 2020; Donti et al 2017; Elmachtoub and Grigas 2020). Butler and Kwon (2021) proposed a stochastic optimization framework for integrating expected return predictions in a mean–variance optimization setting. Specifically, they followed the work of Agrawal et al (2019) and Amos and Kolter (2017), and they restructured the stochastic program as a neural network with a differentiable quadratic programming layer. They demonstrated that their integrated prediction and optimization (IPO) framework can result in lower out-of-sample costs and higher absolute and risk-adjusted performance in comparison with the “predict, then optimize” alternative.

The aim of this paper is to present the IPO framework for covariance estimation and to evaluate the effectiveness of a covariance estimation process that is made aware of its final decision-based errors. The key distinction is that under the integrated setting, the prediction model parameters are estimated in order to yield the best decisions, not to provide the best predictions. In order to isolate the impact of the integrated covariance estimation, we consider purely risk-based portfolio optimizations: minimum-variance (MV) (Markowitz 1952), maximum-diversification (MD) (Choueifaty and Coignard 2008) and equal-risk-contribution (ERC) portfolios (Maillard et al 2010).

The remainder of the paper is outlined as follows. In Section 2 we present a brief literature review in the field of covariance estimation and we summarize our primary contributions. In Section 3 we describe the covariance model and review the current approach to parameter estimation. We then state the IPO program presented by Butler and Kwon (2021) and provide the integrated formulations for the aforementioned risk-based optimizations. Specifically, we embed the decision-based objective program as a specialized convex programming layer in a feed-forward neural network. We make use of back-propagation for efficient gradient computation and propose a first-order method for optimizing the integrated program through stochastic gradient descent. We conclude with a simulation study using US industry sector data and US stock data to demonstrate that the integrated framework can provide lower realized costs that, in some cases, translate to improved out-of-sample economic performance in comparison with the “predict, then optimize” alternative.

## 2 Literature review

Most risk-based portfolio optimizations require as input a covariance matrix of asset returns. Indeed, volatility and covariance estimation has a long history in the field of quantitative finance and econometrics. The seminal work of Engle (1982) and Bollerslev (1986) culminated in the now widely accepted univariate generalized autoregressive conditional heteroscedasticity (GARCH) model, which estimates conditional variances as a linear combination of past realizations and variance estimates. With a relatively flexible lag structure, GARCH models have proven capable of capturing long-memory volatility effects and other so-called stylized facts common to many financial time series (Bollerslev 1990; Bollerslev et al 1994).

The first direct extension to multivariate GARCH modeling, vector error correction (VEC-GARCH), was introduced by Bollerslev et al (1988), but it was found to be highly impractical for problems with more than three assets. Since then, several alternative multivariate GARCH models have been proposed, and we refer the reader to Bauwens et al (2003) for a comprehensive overview. Of the more practical multivariate GARCH models, the constant conditional correlation (CCC-GARCH) model, presented by Bollerslev (1990), overcomes the dimensionality issues of VEC-GARCH by assuming a constant correlation matrix. Fitting CCC-GARCH models is straightforward: individual univariate GARCH parameters are estimated via ML and the CCC matrix is then estimated from the resulting GARCH residuals. Empirical studies demonstrate that the constant correlation assumption is valid for many asset classes (Tse 2000), and the effectiveness of the estimator is typically evaluated by its ability to minimize realized portfolio variance (see, for example, Lien et al 2002; Varga-Haszonits and Kondor 2007).

Under certain circumstances, the assumption of CCC may be unrealistic. The dynamic conditional correlation (DCC) GARCH model, which was introduced by Engle (2002), allows for a time-varying conditional correlation matrix and can handle problems of a moderate to large size. As described in Section 3.1, the DCC-GARCH model estimates a correlation matrix proxy process as a linear combination of past standardized realization cross products and its lagged correlation proxy estimates. Again, parameter fitting is typically performed by ML estimation using the following multistep approach.

- (1)
For each asset fit a univariate GARCH model by ML.

- (2)
Estimate the unconditional correlation matrix from GARCH residuals.

- (3)
Estimate the conditional correlation dynamics by ML or composite likelihood (Pakel et al 2019).

For large-scale problems, it is common to model asset returns, and subsequently their covariance, according to a linear factor model (see, for example, Clarke et al 2005; Fan et al 2008; Goldfarb and Iyengar 2003). Factor GARCH models, originally introduced by Engle et al (1990), assume that returns are generated by a set of factors that are themselves conditionally heteroscedastic. More recently, De Nard et al (2019) proposed a time-varying factor covariance model for large-scale portfolio problems. Factor regression coefficients are fitted by ordinary least squares (OLS), and the GARCH dynamics are fitted according to the multistep ML optimization, described above. They demonstrate that their factor DCC-GARCH model, when combined with nonlinear shrinkage, can yield more efficient portfolio allocation, as measured by realized out-of-sample portfolio objectives.

### 2.1 Main contributions

It is clear that the literature is rich with sophisticated and highly effective covariance prediction models. The aim of this paper is not to provide a new covariance model, but rather to present an alternative framework for parameter estimation of existing models, under the assumption that the covariance estimate will ultimately be used as an input to a decision-based optimization problem. The IPO framework is largely inspired by the work of Butler and Kwon (2021) and Elmachtoub and Grigas (2020), both of whom provide methods for integrating prediction models in a mean–variance optimization context under static covariance assumptions. To our knowledge, this is the first empirical study of integrated covariance estimation in a risk-based portfolio optimization setting.

The IPO framework is similar in spirit to the work of Engle and Colacito (2006), who propose evaluating covariance prediction models by a custom loss function that quantifies the out-of-sample variance of optimized portfolios induced by the covariance estimate. The realized portfolio variance loss function was later adopted by Ledoit and Wolf (2017) and Engle et al (2019) but was never used for parameter estimation itself. Instead, these authors continued to perform parameter estimation by OLS-ML, and the custom loss function was only ever used as an economic measure of out-of-sample prediction accuracy. Indeed, we observe that in most studies discussed above the effectiveness of the covariance estimate is measured, at least in part, by its ability to minimize a realized decision-based portfolio objective, despite the fact that this is not the objective under which the model parameters are optimized.

The IPO framework, discussed in more detail in Section 3.3, provides a practical and approachable solution to this problem by directly integrating the prediction model parameter estimation process in the context of the final decision-based objectives and constraints. In this paper we consider three risk-based portfolio optimizations: MV, MD and ERC, under the standard long-only, fully invested constraint set. The covariance model, described in Section 3.1, closely follows the time-varying factor covariance model, proposed by De Nard et al (2019), and was chosen for its reported improvement in realized performance and its ability to handle problems of large dimension. Further, under traditional parameter estimation, fitting the covariance model requires solving two independent prediction optimization problems: OLS for fitting the factor coefficients and ML for fitting the multivariate GARCH dynamics. The IPO framework, on the other hand, optimizes all model parameters jointly, thus demonstrating the flexibility of the integrated approach and its ability to handle prediction models of arbitrary complexity.

Numerical experiments are described in Section 4 and are performed on a universe of 10 US industry sector portfolios, with weekly return data from January 1964 to October 2020, as well as a larger universe of 255 liquid US stock data, with weekly return data from January 1990 to December 2020. We consider two multi-factor models, based on the Fama–French three-factor (FF3) and five-factor (FF5) models (Fama and French 1993, 2015). Multivariate factor GARCH dynamics are modeled according to CCC-GARCH and DCC-GARCH. We therefore consider four covariance model specifications:

- (1)
FF3-CCC is the Fama–French three-factor model with constant conditional correlation GARCH;

- (2)
FF5-CCC is the Fama–French five-factor model with constant conditional correlation GARCH;

- (3)
FF3-DCC is the Fama–French three-factor model with dynamic conditional correlation GARCH; and

- (4)
FF5-DCC is the Fama–French five-factor model with dynamic conditional correlation GARCH.

From an experimental standpoint, our goal is to analyze the out-of-sample performance of IPO in a portfolio optimization setting with real asset price data. Our experiments should be interpreted as a proof of concept, rather than a fully comprehensive financial study. In summary, our analytical and experimental contributions are as follows.

- •
We present the IPO framework for MV portfolios, a special case of quadratic programs, and provide the relevant gradient equations for first-order optimization. Out-of-sample numerical results demonstrate that the integrated approach can provide consistent and economically significant, lower realized portfolio variance in comparison with the “predict, then optimize” alternative. Experimentation on the larger stock universe demonstrates that the difference in realized portfolio variance across the two methods increases as the number of assets in the portfolio increases.

- •
We present the IPO framework for MD portfolios, which is again a special case of general quadratic programs. Out-of-sample numerical results demonstrate that the IPO framework is successful in providing consistently higher realized portfolio diversification ratios. For our particular data set, however, the difference in realized diversification ratios is not statistically significant and therefore does not result in materially different economic outcomes in comparison with the traditional approach.

- •
We present the IPO framework for ERC portfolios. The introduction of a log-barrier term results in a convex nominal objective function that is not a standard quadratic program. We derive the relevant gradient equations for first-order optimization. Out-of-sample numerical results demonstrate that, in most cases, the integrated approach can yield a consistently lower dispersion in realized risk-contributions, as measured by the Herfindahl index. For our particular data set we find that the magnitude of the difference in realized risk contributions is small and does not result in meaningfully different economic outcomes.

In summary, we observe that when the covariance prediction model is more poorly specified, then the IPO MV framework will often yield consistent and significantly lower out-of-sample portfolio volatility, and this will result in improved economic outcomes. We find this result encouraging from a proof-of-concept standpoint and it further suggests that the IPO framework may be more resilient to model misspecification in comparison with a more traditional parameter estimation approach. For the MD and ERC portfolios, the improvement in realized costs is marginal and does not result in improved economic performance. These results are discussed in more detail in Section 4. Finally, we acknowledge that the results will vary considerably depending on the choice of data set and covariance estimation model: therefore, further experimentation is required.

## 3 Methodology

### 3.1 Covariance model

We begin by briefly describing the covariance prediction model under the traditional “predict, then optimize” framework, and we refer the reader to Appendix A online for full model details.

We consider a universe of ${d}_{y}$ assets and denote the matrix of (excess) return observations as $?=[{?}^{(1)},{?}^{(2)},\mathrm{\dots},{?}^{(m)}]\in {\mathbb{R}}^{m\times {d}_{y}}$ with $m>{d}_{y}$. Asset returns ${?}^{(i)}$ are modeled according to a linear factor model of associated auxiliary variables ${?}^{(i)}\in {\mathbb{R}}^{{d}_{x}}$. The standard OLS regression model is given as

$${\widehat{?}}^{(i)}={?}_{1}^{\mathrm{T}}{?}^{(i)}+{?}^{(i)},$$ | (3.1) |

where ${?}_{1}\in {\mathbb{R}}^{{d}_{x}\times {d}_{y}}$ is the matrix of regression coefficients and ${?}^{(i)}\sim ?(\mathrm{?},?)\in {\mathbb{R}}^{{d}_{y}}$ is the vector of residual returns. We let ${?}^{(i)}\sim ?(?,{?}^{(i)}({?}_{2},{?}_{3}))$, and therefore ${?}^{(i)}({?}_{2},{?}_{3})\in {\mathbb{R}}^{{d}_{x}\times {d}_{x}}$ denotes the time-varying auxiliary covariance matrix. Here ${?}_{2}$ and ${?}_{3}$ denote the univariate and multivariate GARCH parameters, respectively, and are estimated by ML. Further, we assume that the residual returns are independent: $\mathrm{cov}({?}^{(i)},{?}^{(j)})=0$ for all $i\ne j$. The static matrix $?\in {\mathbb{R}}^{{d}_{y}\times {d}_{y}}$ therefore denotes the diagonal matrix of residual variance.

Traditionally, ${?}_{1}$ is chosen in order to minimize a least-squares loss function. Specifically, given training data set $?={\{({?}^{(i)},{?}^{(i)})\}}_{i=1}^{m}$ we choose ${\widehat{?}}_{1}$ such that

$${\widehat{?}}_{1}=\underset{{?}_{1}}{\mathrm{arg}\mathrm{min}}{?}_{?}[{\parallel {?}_{1}^{\mathrm{T}}{?}^{(i)}-{?}^{(i)}\parallel}_{2}],$$ | (3.2) |

where ${?}_{?}$ denotes the expectation operator with respect to the training distribution. The resulting least-squares estimators are given by

$$\begin{array}{cc}\hfill {\widehat{?}}^{(i)}& ={\widehat{?}}_{1}^{\mathrm{T}}{?}^{(i)},\hfill \\ \\ \hfill {\widehat{?}}^{(i)}& ={\widehat{?}}_{1}^{\mathrm{T}}{\widehat{?}}^{(i)}({?}_{2},{?}_{3}){\widehat{?}}_{1}+\widehat{?},\hfill \end{array}\}$$ | (3.3) |

where ${\widehat{?}}^{(i)}$ denotes the conditional covariance estimate of asset returns.

Our particular covariance model exhibits a complicated dependency between the various model parameters. As outlined in Appendix A online, a traditional parameter estimation approach would treat each parameter set independently, and it can be summarized by the following steps.

- (1)
Fit the regression coefficients, ${?}_{1}$, by optimizing the OLS program (3.2).

- (2)
Fit the univariate GARCH parameters, ${?}_{2}$, by ML.

- (3)
Use the GARCH residuals to estimate the CCC.

- (4)
In the case of DCC-GARCH, fit the multivariate GARCH parameters, ${?}_{3}$, by ML.

- (5)
Combine volatility and correlation estimates to form the time-varying auxiliary covariance estimate.

- (6)
Combine the auxiliary covariance estimate with the OLS coefficients to form the time-varying asset covariance estimate, provided by (3.3).

Going forward, for compactness and without loss of generality we denote the covariance prediction model parameters by $?=[{?}_{1},{?}_{2},{?}_{3},\mathrm{\dots}]$ and let $f:{\mathbb{R}}^{{d}_{x}}\times {\mathbb{R}}^{{d}_{\theta}}\to {\mathbb{R}}^{{d}_{y}\times {d}_{y}}$ denote the $?$-parameterized prediction model:

$${\widehat{?}}^{(i)}=f({?}^{(i)},?).$$ | (3.4) |

### 3.2 “Predict, then optimize” framework

Incorporating the covariance prediction into a decision-based optimization according to the traditional “predict, then optimize” framework is straightforward. We define the portfolio $?\in {\mathbb{R}}^{{d}_{z}}$, where the element ${?}_{i}$ denotes the proportion of total capital invested in the $i$th asset. We assume that we have a risk-based nominal cost function $c:{\mathbb{R}}^{{d}_{z}}\times {\mathbb{R}}^{{d}_{z}\times {d}_{z}}\to \mathbb{R}$, which takes as input the portfolio decision vector, $?$, constrained to a feasible region $\mathbb{Z}\subset {\mathbb{R}}^{{d}_{z}}$, and the estimated covariance matrix ${\widehat{?}}^{(i)}$. Generating optimal decisions therefore amounts to plugging in the covariance estimate ${\widehat{?}}^{(i)}$, and solving the following deterministic optimization problem:

$$\underset{?\in \mathbb{Z}}{minimize}c(?,{\widehat{?}}^{(i)}),$$ | (3.5) |

with optimal solution

$${\widehat{?}}^{(i)}({\widehat{?}}^{(i)})=\underset{?\in \mathbb{Z}}{\mathrm{arg}\mathrm{min}}c(?,{\widehat{?}}^{(i)}).$$ | (3.6) |

### 3.3 IPO

In the integrated framework, we are interested in optimizing $?$ in the context of the nominal cost, $c$, and its constraints, $\mathbb{Z}$, in order to minimize the average realized nominal cost of the policy ${?}^{*}(?,?)$ induced by this parameterization. That is, for a fixed instantiation $(?,?)$, we solve the nominal cost program (3.7) in order to determine the optimal action ${?}^{*}(?,?)$ corresponding to our observed training input ${\{({?}^{(i)},{?}^{(i)})\}}_{i=1}^{m}$. Specifically, we solve

$$\underset{?\in \mathbb{Z}}{minimize}c(?,f(?,?)),$$ | (3.7) |

with the optimal solution given by

$${?}^{*}(?,?)=\underset{?\in \mathbb{Z}}{\mathrm{arg}\mathrm{min}}c(?,f(?,?)).$$ | (3.8) |

Our objective is to find the model parameters that minimize the average realized nominal cost induced by the decision policy ${?}^{*}(?,?)$. The resulting IPO problem is a stochastic optimization problem,

$\underset{?\in ?}{minimize}$ | $\mathrm{\hspace{1em}}{?}_{?}[c({?}^{*}(?,?),?)]$ | |||

subject to | $\mathrm{\hspace{1em}}{?}^{*}(?,?)=\underset{?\in \mathbb{Z}}{\mathrm{arg}\mathrm{min}}c(?,f(?,?)),$ | (3.9) |

where $?$ denotes the feasible region of prediction model parameters. In practice, we are typically presented with discrete training observations $?={\{({?}^{(i)},{?}^{(i)})\}}_{i=1}^{m}$ and we can therefore approximate the expectation by its sample average approximation (Shapiro et al 2009). The full IPO problem is presented in discrete form as

$\underset{?\in ?}{minimize}$ | $\mathrm{\hspace{1em}}{\displaystyle \frac{1}{m}}{\displaystyle \sum _{i=1}^{m}}c({?}^{*}({?}^{(i)},?),{?}^{(i)})$ | |||

subject to | $\mathrm{\hspace{1em}}{?}^{*}({?}^{(i)},?)=\underset{?\in \mathbb{Z}}{\mathrm{arg}\mathrm{min}}c(?,f({?}^{(i)},?))\mathit{\hspace{1em}}\forall i\in 1,\mathrm{\dots},m.$ | (3.10) |

We denote the objective function of the IPO problem by

$$L(?)=\frac{1}{m}\sum _{i=1}^{m}c({?}^{*}({?}^{(i)},?),{?}^{(i)}).$$ | (3.11) |

Observe that the IPO formulation results in a complicated dependency of the model parameters, $?$, on the optimized values, ${?}^{*}({?}^{(i)},?)$, connected through the $\mathrm{arg}\mathrm{min}$ operator. We discuss this relationship more fully in the context of a general quadratic cost function below. We then present the framework for the three risk-based portfolio optimizations under consideration.

### 3.4 IPO: general quadratic programming

We begin by considering the general quadratic program with feasible region, $\mathbb{Z}$, defined by both linear equality and inequality constraints. Specifically, we consider nominal cost functions of the form

$$c(?,?)={?}^{\mathrm{T}}?+\frac{1}{2}{?}^{\mathrm{T}}??,$$ | (3.12) |

where $?\in {\mathbb{R}}^{{d}_{?}}$. Let $?\in {\mathbb{R}}^{{d}_{\text{eq}}\times {d}_{?}}$, $?\in {\mathbb{R}}^{{d}_{\text{eq}}}$ and $?\in {\mathbb{R}}^{{d}_{\text{iq}}\times {d}_{?}},?\in {\mathbb{R}}^{{d}_{\text{iq}}}$ describe the linear equality and inequality constraints, respectively, giving the following quadratic program:

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}{?}^{\mathrm{T}}?+\frac{1}{2}{?}^{\mathrm{T}}??$ | |||

subject to | $\mathrm{\hspace{1em}}??=?,$ | |||

$\mathrm{\hspace{1em}}??\le ?.$ | (3.13) |

The aim of the integrated approach is to solve for locally optimal parameters, ${?}^{*}$, that minimize the realized average decision-based objective $L({?}^{*})$. In this paper we propose the use of a first-order (stochastic) gradient descent method. From the multivariate chain rule, the gradient, ${\nabla}_{\theta}L$ is expressed as

$${\nabla}_{\theta}L=\frac{\partial L}{\partial {?}^{*}}\frac{\partial {?}^{*}}{\partial ?}.$$ | (3.14) |

In our particular case, the nominal cost function, $c$, is smooth and twice differentiable in the decision vector $?$, and it is therefore relatively straightforward to compute the gradient $\partial L/\partial {?}^{*}$. The primary technical challenge lies in computing the Jacobian $\partial {?}^{*}/\partial ?$, as it requires differentiation through the $\mathrm{arg}\mathrm{min}$ operator. As outlined by Amos and Kolter (2017), for a general quadratic program, the solution to the system of equations provided by the Karush–Kuhn–Tucker (KKT) conditions at optimality, ${?}^{*}$, provides the necessary ingredients for computing the desired Jacobian. This is described in full detail in Appendix B online.

In particular, we seek to compute the Jacobian,

$$\frac{\partial {?}^{*}}{\partial ?}=\frac{\partial {?}^{*}}{\partial ?}\frac{\partial ?}{\partial ?}.$$ |

In practice, however, we never explicitly form the Jacobian matrix and compute $\partial {?}^{*}/\partial ?$ directly. We instead treat the nominal quadratic program as a differentiable layer in a neural network. The IPO equivalent neural-network structure is depicted in Figure 1. In the forward pass, the input layer takes the auxiliary variables ${?}^{(i)}$ and passes them to our prediction model, $f({?}^{(i)},?)$, to produce the estimates, ${\widehat{?}}^{(i)}$. Covariance predictions are then passed to a differentiable quadratic programming layer which, for a given input ${\widehat{?}}^{(i)}$, solves the nominal program and returns the optimal portfolio weights ${?}^{*}({?}^{(i)},?)$. Finally, the quality of the portfolio weights are evaluated by the cost function, $c({?}^{*}({?}^{(i)},?),{?}^{(i)})$, in the context of the realized (true) covariance matrix, ${?}^{(i)}$.

The partial derivative of the cost with respect to the covariance estimate, ${\widehat{?}}^{(i)}$, is computed by implicitly differentiating the solution mapping provided by the KKT conditions at optimality, ${?}^{*}({?}^{(i)},?)$, as described in Appendix B online. The back-propagation algorithm then computes the remaining partial derivatives in the regular fashion in order to generate the gradient of the realized cost with respect to the prediction model parameters.

We now have what amounts to a first-order algorithm for optimizing covariance prediction models in the context of their final decision-based optimization. In practice, we search for a locally optimal solution using stochastic gradient descent. In this case, the descent direction, ${?}_{\theta}$, at each iteration attempts to approximate the gradient, ${\nabla}_{\theta}L$, and is given by

$${?}_{\theta}={\sum _{i\in B}\left(\frac{\partial c}{\partial ?}\right)|}_{({?}^{*}({?}^{(i)},?),{?}^{(i)})}\approx {\nabla}_{\theta}L,$$ |

where, in standard stochastic gradient descent, $B$ represents a randomly drawn sample batch from the training data.

### 3.5 IPO: MV portfolio

The MV portfolio is a special case of the general quadratic program presented in (3.13). We let $?=\mathrm{?}$ and we therefore have the following nominal cost function:

$${c}_{\mathrm{MV}}(?,?)=\frac{1}{2}{?}^{\mathrm{T}}??,$$ | (3.15) |

giving the nominal quadratic program

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}\frac{1}{2}{?}^{\mathrm{T}}??$ | |||

subject to | $\mathrm{\hspace{1em}}??=?,$ | |||

$\mathrm{\hspace{1em}}??\le ?.$ | (3.16) |

In all experiments, we define the constraint set $\mathbb{Z}=\{?\in {\mathbb{R}}^{{d}_{?}}\mid {?}^{\mathrm{T}}\mathrm{?}=1,?\ge 0\}$, and thus $?$ represent long-only, fully invested portfolios.

The full IPO formulation for the MV portfolio is straightforward:

$\underset{?\in ?}{minimize}$ | $\mathrm{\hspace{1em}}{\displaystyle \frac{1}{m}}{\displaystyle \sum _{i=1}^{m}}{c}_{\mathrm{MV}}({?}^{*}({?}^{(i)},?),{?}^{(i)})$ | |||

subject to | $\mathrm{\hspace{1em}}{?}^{*}({?}^{(i)},?)=\underset{?\in \mathbb{Z}}{\mathrm{arg}\mathrm{min}}{c}_{\mathrm{MV}}(?,f({?}^{(i)},?))\mathit{\hspace{1em}}\forall i\in 1,\mathrm{\dots},m.$ | (3.17) |

### 3.6 IPO: MD portfolio

In acknowledgement that expected returns are often difficult to estimate, Choueifaty and Coignard (2008) proposed optimizing portfolios in order to maximize portfolio diversification. They define the diversification ratio as the ratio of weighted-average asset volatility to portfolio volatility. The negative diversification ratio cost is

$${c}_{\mathrm{DR}}(?,?)=-\frac{{?}^{\mathrm{T}}\sqrt{\mathrm{diag}(?)}}{\sqrt{{?}^{\mathrm{T}}?}?}.$$ | (3.18) |

As outlined by Choueifaty and Coignard (2008), the diversification ratio has many interesting properties. First, it is homogeneous of degree 0, and is therefore invariant under scalar multiplication of $?$. Second, for $?\succ 0$, the diversification ratio of any long-only portfolio will be strictly greater than or equal to 1. Equality is achieved when the portfolio holds a single asset. Further, for long-only portfolios, the square of the diversification ratio quantifies the number of independent sources of risk, or bets, in the portfolio (Choueifaty et al 2013). Lastly, if expected (excess) returns, $?\in {\mathbb{R}}^{{d}_{y}}$, are proportional to volatility, then minimizing the negative diversification ratio is equivalent to minimizing the negative portfolio Sharpe ratio

$${c}_{\mathrm{SR}}(?,?)=-\frac{{?}^{\mathrm{T}}?}{\sqrt{{?}^{\mathrm{T}}??}}.$$ | (3.19) |

Direct minimization of the negative diversification ratio results in the following nonconvex program:

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}-{\displaystyle \frac{{z}^{T}\sqrt{\mathrm{diag}(?)}}{\sqrt{{z}^{T}?z}}}$ | |||

subject to | $\mathrm{\hspace{1em}}??=?,$ | |||

$\mathrm{\hspace{1em}}??\le ?.$ | (3.20) |

In the case of long-only, fully invested constrained portfolios, we follow Cornuejols and Tutuncu (2006) by recasting (3.20) as a convex quadratic program. This is possible as the inequality constraints are themselves homogeneous of degree 0. Specifically, we define ${\mathbb{Z}}_{0}=\{?\in {\mathbb{R}}^{{d}_{z}},k\in \mathbb{R}\mid k>0,?/k\in \mathbb{Z}\}\cup (0,0)$ as the union of constraints that are homogeneous of degree 0 and the zero vector. Optimizing program (3.20) is then equivalent to solving the following convex quadratic program:

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}\frac{1}{2}{?}^{\mathrm{T}}??$ | |||

subject to | $\mathrm{\hspace{1em}}{?}^{\mathrm{T}}\sqrt{\mathrm{diag}(?)}=1,$ | |||

$\mathrm{\hspace{1em}}(?,k)\in {\mathbb{Z}}_{0}.$ | (3.21) |

The full IPO formulation for the MD portfolio is

$\underset{?\in ?}{minimize}$ | $\frac{1}{m}}{\displaystyle \sum _{i=1}^{m}}-{c}_{\mathrm{MD}}({?}^{*}({?}^{(i)},?),{?}^{(i)})$ | |||

subject to | ${?}^{*}({?}^{(i)},?)=\underset{?}{\mathrm{arg}\mathrm{min}}\frac{1}{2}{?}^{\mathrm{T}}f({?}^{(i)},?)?\mathit{\hspace{1em}}\forall i\in 1,\mathrm{\dots},m,$ | |||

${?}^{\mathrm{T}}\sqrt{\mathrm{diag}(f({?}^{(i)},?))}=1,(?,k)\in {\mathbb{Z}}_{0}.$ | (3.22) |

### 3.7 IPO: ERC portfolio

Maillard et al (2010) first proposed the ERC portfolio as a stable alternative to MV portfolios. By definition, the ERC portfolio holds all assets such that the risk contribution of each asset is made equal. As a result, the portfolio generally achieves a high degree of ex ante diversification, with portfolio weights that tend to be less concentrated than its MV or MD counterparts.

We begin by noting that portfolio standard deviation, ${\sigma}_{\mathrm{p}}=\sqrt{{?}^{\mathrm{T}}??}$, satisfies Euler’s identity:

$${\sigma}_{\mathrm{p}}=\sum _{j=1}^{n}{?}_{j}\frac{\partial {\sigma}_{\mathrm{p}}}{\partial {?}_{j}}={?}^{\mathrm{T}}\frac{\mathrm{d}{\sigma}_{\mathrm{p}}}{\mathrm{d}?}.$$ | (3.23) |

It follows then that the marginal risk contribution (MRC) is given by

$$\mathrm{MRC}({?}_{j})=\frac{1}{{\sigma}_{\mathrm{p}}}\frac{\partial {\sigma}_{\mathrm{p}}}{\partial {?}_{j}}=\frac{{(??)}_{j}{?}_{j}}{{?}^{\mathrm{T}}??}.$$ | (3.24) |

The ERC, also known as risk parity, is attained when $\mathrm{MRC}({?}_{j})=1/{d}_{z}$ for all $j\in (1,2,\mathrm{\dots},{d}_{z})$. We follow Costa and Kwon (2020) to define the Herfindahl index of risk contributions as follows:

$${c}_{\mathrm{ERC}}(?,?)=\sum _{j=1}^{m}{\left(\frac{{?}_{j}\odot {(??)}_{j}}{{?}^{\mathrm{T}}??}\right)}^{2},$$ | (3.25) |

where $\odot $ denotes the component-wise multiplication. Note that the values of ${c}_{\mathrm{ERC}}(?,?)$ range between $1/{d}_{z}$ for ERCs and 1 for a fully concentrated portfolio.

In practice, rather than minimizing the Herfindahl index directly, we instead follow Maillard et al (2010) and consider an MV optimization subject to a diversification constraint:

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}\frac{1}{2}{?}^{\mathrm{T}}??$ | |||

subject to | $\mathrm{\hspace{1em}}{\displaystyle \sum _{j=1}^{{d}_{z}}}\mathrm{ln}({?}_{j})\ge h,$ | |||

$\mathrm{\hspace{1em}}?\ge 0,$ | (3.26) |

where $h$ is an arbitrary constant. By relaxing the long-only constraint we observe that various solutions to program (3.26) exist. In general, there exist ${2}^{{d}_{z}}$ orthants in ${\mathbb{R}}^{{d}_{z}}$ and therefore there exist at most ${2}^{{d}_{z}}$ unique portfolios that achieve the risk-parity condition. We choose to work with the more general ERC program, presented by Bai et al (2016), which allows for both positive and negative weights (though in this paper we only consider the positive orthant). We follow the work of Spinu (2013) and recast the problem as an inequality constrained convex program:

$\underset{?}{minimize}$ | $\mathrm{\hspace{1em}}\frac{1}{2}{?}^{\mathrm{T}}??-{\displaystyle \sum _{j=1}^{{d}_{z}}}{?}_{j}\mathrm{ln}(-{?}_{jj}{?}_{j})$ | |||

subject to | $\mathrm{\hspace{1em}}??\le 0,$ | (3.27) |

where $?\in {\mathbb{R}}^{{d}_{z}}$ is a vector of risk-contribution weights (ie, ${?}_{j}=1/{d}_{z}$ for ERC). The diagonal matrix $?\in {\mathbb{R}}^{{d}_{z}\times {d}_{z}}$ constrains the optimization to the relevant orthant of interest, with diagonal elements ${G}_{jj}\in \{-1,1\}$. Spinu (2013) notes that the objective function in program (3.27) is self-concordant and can be solved efficiently by Newton’s method.

Finally, the full IPO formulation for the ERC portfolio is presented in the following program:

$\underset{?\in ?}{minimize}$ | $\mathrm{\hspace{1em}}{\displaystyle \frac{1}{m}}{\displaystyle \sum _{i=1}^{m}}-{c}_{\mathrm{ERC}}({?}^{*}({?}^{(i)},?),{?}^{(i)})$ | ||

subject to | $\mathrm{\hspace{1em}}{?}^{*}({?}^{(i)},?)=\underset{?}{\mathrm{arg}\mathrm{min}}\frac{1}{2}{?}^{\mathrm{T}}f({?}^{(i)},?)?-{\displaystyle \sum _{j=1}^{{d}_{z}}}{?}_{j}\mathrm{ln}(-{?}_{jj}{?}_{j})\mathit{\hspace{1em}}\forall i\in 1,\mathrm{\dots},m,??\le 0.$ |

As in the quadratic programming case, we embed program (3.27) as a differentiable convex programming layer in an end-to-end trainable neural network. As before, we implicitly differentiate the KKT conditions at optimality, ${?}^{*}({?}^{(i)},?)$, in order to compute the gradients of the cost function with respect to the relevant problem variables. This is described in full detail in Appendix C online.

## 4 Computational experiments

Sections 4.1 and 4.2 compare the IPO and OLS-ML methodology across two asset universes:

- (1)
10 US industry sectors with testing from January 1980 to October 2020 and

- (2)
255 US liquid stocks with testing from January 1995 to December 2020.

In both sets of experiments, the MV portfolios with parameters estimated by IPO exhibit meaningfully lower out-of-sample costs, which in general results in improved economic performance in comparison with the traditional “predict, then optimize” approach. On the other hand, the US industry sector experiments with MD and ERC portfolios exhibit only a marginal reduction in the out-of-sample costs. As a result, we find that the realized economic impact of the IPO framework under these portfolio objectives is relatively immaterial. That said, we are encouraged that the results of the MD and ERC portfolios under the IPO framework are no worse than the traditional estimation approach, suggesting, perhaps, that the IPO method converges to prediction model parameterizations of equal integrity. Further, we acknowledge that these findings may be particular to our chosen data sets, or alternatively that they may speak to the stability and resilience of the MD and ERC portfolios to errors in the covariance estimate. To answer these questions, further testing is required. For completeness, the MD and ERC results are presented in Appendixes F and G online.

The MV results presented below are meant to serve as a proof of concept and to illustrate the potential benefit of the IPO framework in comparison with the traditional parameter estimation process, which is based on OLS-ML. All performance is the gross of trading costs and in excess of the 13-week Treasury bill rate, provided by the Federal Reserve Bank of St. Louis.

### 4.1 Experiment 1: US industry data

Industry | |
---|---|

short | |

name | Description |

NoDur | Consumer nondurables: food, tobacco, |

textiles, apparel, leather and toys | |

Durbl | Consumer durables: cars, TVs, |

furniture and household appliances | |

Manuf | Manufacturing: machinery, trucks, planes, |

chemicals, paper and printing | |

Enrgy | Energy: oil, gas, coal extraction |

and products | |

HiTec | Technology: computers, software |

and electronic equipment | |

Telcm | Telecommunications: telephone |

and television transmission | |

Shops | Shops: wholesale, retail and |

services (eg, laundries, repair shops) | |

Hlth | Health: health care, medical equipment |

and pharmaceuticals | |

Utils | Utilities: water, sewage services |

and electricity | |

Other | Other: mines, construction, transportation, |

hotels, entertainment and finance |

We consider an asset universe of 10 US industry sectors, described in Table 1. The weekly price data ranges from January 1964 to October 2020 and is provided by the Kenneth R. French data library. We consider two factor models, based on the FF3 model and the FF5 model. Historical factor returns are also provided by the Kenneth R. French data library. The multivariate factor GARCH dynamics are modeled according to CCC-GARCH and DCC-GARCH. We refer the reader to Appendix D online for more comprehensive implementation details. For each nominal portfolio optimization we consider four covariance model instantiations: FF3-CCC, FF5-CCC, FF3-DCC and FF5-DCC.

All experiments start in January 1980 and end in October 2020. For each experiment, the first 15 years (from January 1964 to December 1979) are used to perform the initial parameter estimation. Thereafter, we apply a walk-forward training and testing methodology. The optimal covariance model coefficients are updated every two years using all available data for parameter estimation, and the optimal policy, ${?}^{*}({?}^{(i)},{?}^{*})$, is then applied to the next out-of-sample two-year segment. Portfolios are rebalanced approximately once a month. In order to minimize the impact of rebalance timing luck, the portfolios are formed at the close of each week, and we rebalance 25% of the exposure on a weekly basis (Hoffstein et al 2020).

We evaluate the IPO and OLS-ML methodologies based on their respective out-of-sample realized nominal cost values, as defined in Section 3. To quantify the magnitude and consistency of these results and to ensure our results are robust to potential outliers in the out-of-sample periods, we bootstrap the out-of-sample distribution using 1000 samples as follows.

- (1)
For each $k\in \{1,2,\mathrm{\dots},1000\}$, sample (without replacement) a batch, ${B}_{k}$, with $|{B}_{k}|=52$ observations (one year) from the out-of-sample period.

- (2)
For both the IPO and OLS-ML methods, compute the average realized costs over the sample:

$${\overline{c}}_{{B}_{k}}^{\text{IPO}}\frac{1}{|{B}_{k}|}\sum _{i\in {B}_{k}}c({?}^{*}({?}^{(i)},{?}^{*}),{?}^{(i)})$$ and $${\overline{c}}_{{B}_{k}}^{\text{OLS-ML}}\frac{1}{|{B}_{k}|}\sum _{i\in {B}_{k}}c({?}^{*}({?}^{(i)},\widehat{?}),{?}^{(i)}),$$ where ${?}^{*}$ and $\widehat{?}$ denote the IPO and OLS-ML optimal coefficients, respectively.

Note that in each sample draw we use the same observation dates across both methodologies in order to fairly compare the realized nominal costs over the resulting sample. We report the dominance ratio (DR), which we define as the proportion of samples for which the realized cost of the IPO methodology is less than that of the OLS-ML methodology:

$$ | (4.1) |

where $?$ denotes the indicator function and $N=1000$. We also evaluate performance based on well-documented economic and portfolio risk metrics, summarized in Appendix E online.

(a) FF3-CCC | ||
---|---|---|

Statistic | IPO | OLS-ML |

Excess mean | 0.0933 | 0.0937 |

Volatility | 0.1423 | 0.1511 |

Sharpe | 0.6556 | 0.6196 |

VaR | $-$0.0440 | $-$0.0468 |

Avg DD | $-$0.0313 | $-$0.0327 |

CDaR | $-$0.2091 | $-$0.2386 |

(b) FF5-CCC: 1966–2020 | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.07925 | 0.08566 |

Volatility | 0.1430 | 0.1445 |

Sharpe | 0.5541 | 0.5930 |

VaR | $-$0.0448 | $-$0.0445 |

Avg DD | $-$0.0327 | $-$0.0295 |

CDaR | $-$0.2190 | $-$0.2030 |

(c) FF3-DCC | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.0951 | 0.0932 |

Volatility | 0.1390 | 0.1489 |

Sharpe | 0.6840 | 0.6261 |

VaR | $-$0.0432 | $-$0.0462 |

Avg DD | $-$0.0300 | $-$0.0310 |

CDaR | $-$0.1983 | $-$0.2211 |

(d) FF5-DCC | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.0803 | 0.0888 |

Volatility | 0.1418 | 0.1418 |

Sharpe | 0.5661 | 0.6260 |

VaR | $-$0.0444 | $-$0.0438 |

Avg DD | $-$0.0338 | $-$0.0287 |

CDaR | $-$0.2296 | $-$0.1980 |

Economic performance metrics are provided in Table 2 for the time period from January 1, 1980 to October 31, 2020. We first note that, with the exception of the FF5-DCC model, the IPO method produces a lower long-term realized portfolio volatility. This result is encouraging for the IPO approach as it demonstrates that covariance prediction model parameters can be estimated effectively by a process that seeks to minimize the decision error induced by the estimate. Further, for covariance models using the FF3 factors, the excess mean returns of the IPO and OLS-ML methods are comparable and the IPO approach therefore exhibits an increase in long-term out-of-sample Sharpe ratios. We observe that for the FF3 models, the IPO method provides more conservative portfolio risk metrics, as measured by the value-at-risk (VaR), average drawdown (Avg DD) and conditional drawdown-at-risk (CDaR). However, this is not the case for the FF5 models, in which the IPO method exhibits marginally lower Sharpe ratios and larger risk metrics.

In Figure 2 we compare the realized portfolio variance costs across 1000 bootstrapped out-of-sample realizations, as described above. Observe that for the FF3 models the IPO method provides consistently lower realized portfolio variances. For both FF3 GARCH models we report a DR of 0.91, meaning that in 91% of the samples the IPO method realizes a lower portfolio variance than that of the OLS-ML method. In the case of the FF5 models, however, we do not observe a consistent reduction in realized portfolio variance. In fact, for the FF5-DCC model, the DR is 0.44, meaning that in 56% of the samples the OLS-ML method in fact realizes a lower portfolio variance than that of the IPO method.

Figure 3 charts the average 52-week rolling difference in realized volatility between the IPO and OLS-ML methods. Note that volatility differences less than 0 imply that the IPO method realized a lower average volatility in comparison with the OLS-ML method over that time period. For the FF3 models, we observe that in normal market conditions the difference in realized portfolio volatility across the two methods is small, with values oscillating around the zero line. Indeed, almost all the benefit of the IPO framework is observed during periods of market crisis, notably the dot-com bubble and the market downturn of the early 2000s (from April 1999 to December 2002) and the global financial crisis (from January 2008 to December 2009). As we will show in Section 4.2, these observations are largely consistent with the experimental results for the individual US stock universe.

In contrast, the out-of-sample results for the FF5 models on the US industry data set suggest no obvious preference for the IPO framework over the OLS-ML method. We hypothesize that the benefit of the IPO framework will be most observable when the chosen prediction model happens to be poorly specified. In our experiments, model misspecification can come from the following two primary sources.

- (1)
Factor model misspecification: errors are made as a result of an improper factor model that does not effectively explain the idiosyncratic variance of the assets. This can occur when using the less explanatory FF3 model or, as we will demonstrate in Section 4.2, when the number of assets in the portfolio (and thus the unexplained variance) is large.

- (2)
Correlation model misspecification: errors are made as a result of improper correlation model assumptions. This can occur when the CCC GARCH model assumes asset correlations are constant whereas the DCC GARCH assumes time-varying correlations.

We conjecture that when the number of assets in the portfolio is small, as is the case for the 10 US industry universe, the FF5 model is well-specified and, therefore, the resulting decision errors induced by the covariance predictions is small relative to the more poorly specified FF3 models. Again, as we will demonstrate shortly, these observations are consistent with the results in Section 4.2, where the IPO framework exhibits a monotonic reduction in relative portfolio variance as the number of assets in the portfolio increases.

GICS sector | Stock symbols | |||||||
---|---|---|---|---|---|---|---|---|

Comms | CBB | CMCSA | DIS | FOX | IPG | LUMN | MDP | NYT |

services | T | VOD | VZ | |||||

Consumer | BBY | CBRL | CCL | F | GPC | GPS | GT | HAS |

discretionary | HD | HOG | HRB | JWN | LB | LEG | LEN | LOW |

MCD | NKE | NVR | NWL | PHM | PVH | ROST | TGT | |

TJX | VFC | WHR | WWW | |||||

Consumer | ADM | ALCO | CAG | CASY | CHD | CL | CLX | COST |

staples | CPB | FLO | GIS | HSY | K | KMB | KO | KR |

MO | PEP | PG | SYY | TAP | TR | TSN | UVV | |

WBA | WMK | WMT | ||||||

Energy | AE | APA | BKR | BP | COP | CVX | EOG | HAL |

HES | MRO | OKE | OXY | SLB | VLO | WMB | XOM | |

Financials | AFG | AFL | AIG | AJG | AON | AXP | BAC | BEN |

BK | BXS | C | GL | JPM | L | LNC | MMC | |

PGR | PNC | RJF | SCHW | STT | TROW | TRV | UNM | |

USB | WFC | WRB | WTM | |||||

Health | ABMD | ABT | AMGN | BAX | BDX | BIO | BMY | CAH |

care | CI | COO | CVS | DHR | HUM | JNJ | LLY | MDT |

MRK | OMI | PFE | PKI | SYK | TFX | TMO | VTRS | |

WST | ||||||||

Industrials | ABM | AIR | ALK | AME | AOS | BA | CAT | CMI |

CSL | CSX | DE | DOV | EFX | EMR | ETN | FDX | |

GD | GE | GWW | HON | IEX | ITW | JCI | KSU | |

LMT | LUV | MAS | MMM | NOC | NPK | NSC | PCA | |

RPH | PNR | ROK | ROL | RTX | SNA | SWK | TXT | |

UNP | ||||||||

Information | AAPL | ADBE | ADI | ADP | ADSK | AMAT | AMD | GLW |

technology | HPQ | IBM | INTC | MSFT | MSI | MU | ORCL | ROG |

SWKS | TER | TXN | TYL | WDC | XRX | |||

Materials | APD | AVY | BLL | CCK | CRS | ECL | FMC | GLT |

IFF | IP | MOS | NEM | NUE | OLN | PPG | SEE | |

SHW | SON | VMC | ||||||

Real | ALX | FRT | GTY | HST | PEAK | PSA | VNO | WRI |

estate | WY | |||||||

Utilities | AEP | ATO | BKH | CMS | CNP | D | DTE | DUK |

ED | EIX | ETR | EVRG | EXC | LNT | NEE | NFG | |

NI | NJR | OGE | PEG | PNM | PNW | PPL | SJW | |

SO | SWX | UGI | WEC | XEL |

### 4.2 Experiment 2: US stock data

We follow the experimental design as outlined in Costa and Kwon (2020). We consider an asset universe of 255 liquid US stocks traded on major US exchanges (New York Stock Exchange, Nasdaq, Amex and Arca). The universe is summarized in Table 3, with representative stocks from each of the Global Industry Classification Standard (GICS) sectors. Weekly price data is given from January 1990 to December 2020 and is provided by Quandl.

(a) FF5-CCC ($n=\text{25}$) | ||
---|---|---|

Statistic | IPO | OLS-ML |

Excess mean | 0.1173 | 0.1146 |

${\sigma}_{\text{Mean}}$ | 0.0097 | 0.0093 |

Volatility | 0.1589 | 0.1636 |

${\sigma}_{\text{Vol.}}$ | 0.0105 | 0.0136 |

Sharpe | 0.7405 | 0.7042 |

${\sigma}_{\text{Sharpe}}$ | 0.0687 | 0.0687 |

(b) FF5-DCC ($n=\text{25}$) | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.1178 | 0.1119 |

${\sigma}_{\text{Mean}}$ | 0.0099 | 0.0097 |

Volatility | 0.1595 | 0.1611 |

${\sigma}_{\text{Vol.}}$ | 0.0101 | 0.0133 |

Sharpe | 0.7413 | 0.6983 |

${\sigma}_{\text{Sharpe}}$ | 0.0743 | 0.0762 |

(c) FF5-CCC ($n=\text{50}$) | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.1181 | 0.1160 |

${\sigma}_{\text{Mean}}$ | 0.0079 | 0.0071 |

Volatility | 0.1511 | 0.1587 |

${\sigma}_{\text{Vol.}}$ | 0.0067 | 0.0095 |

Sharpe | 0.7827 | 0.7333 |

${\sigma}_{\text{Sharpe}}$ | 0.0587 | 0.0551 |

(d) FF5-DCC ($n=\text{50}$) | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.1190 | 0.1132 |

${\sigma}_{\text{Mean}}$ | 0.0073 | 0.0078 |

Volatility | 0.1508 | 0.1551 |

${\sigma}_{\text{Vol.}}$ | 0.0064 | 0.0094 |

Sharpe | 0.7898 | 0.7318 |

${\sigma}_{\text{Sharpe}}$ | 0.0552 | 0.0626 |

(e) FF5-CCC ($n=\text{100}$) | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.1203 | 0.1182 |

${\sigma}_{\text{Mean}}$ | 0.0065 | 0.0054 |

Volatility | 0.1456 | 0.1555 |

${\sigma}_{\text{Vol.}}$ | 0.0045 | 0.0063 |

Sharpe | 0.8265 | 0.7615 |

${\sigma}_{\text{Sharpe}}$ | 0.0467 | 0.0428 |

(f) FF5-DCC ($n=\text{100}$) | ||

Statistic | IPO | OLS-ML |

Excess mean | 0.1207 | 0.1155 |

${\sigma}_{\text{Mean}}$ | 0.0063 | 0.0055 |

Volatility | 0.1454 | 0.1508 |

${\sigma}_{\text{Vol.}}$ | 0.0046 | 0.0054 |

Sharpe | 0.8308 | 0.7668 |

${\sigma}_{\text{Sharpe}}$ | 0.0463 | 0.0427 |

We use the more explanatory FF5 model, with multivariate factor GARCH dynamics modeled according to both the CCC-GARCH and the DCC-GARCH. Our aim is to observe the behavior of the IPO framework in a larger-scale setting. We consider MV portfolios in an $n$-asset universe, for $n\in (25,50,100)$. For each covariance model and asset size pair, we conduct a randomized trial experiment. Specifically, each experiment consists of 500 independent trials, where, at the beginning of each trial, a basket of $n$ assets is randomly drawn from the universe of 255 assets. The basket of $n$ assets is held constant throughout the duration of the trial.

All experiments start in January 1995 and end in December 2020, with the first five years used to perform the initial parameter estimation. We apply a walk-forward training and testing methodology whereby the optimal covariance model coefficients are updated every five years using all available data for parameter estimation. As before, portfolios are formed at the close of each week, and we rebalance 25% of the exposure on a weekly basis.

In Table 4 we report the mean and standard deviation of excess mean returns, volatility and Sharpe ratios of out-of-sample returns for each covariance model and universe size pair. In all cases we observe that the IPO method produces a lower average out-of-sample volatility in comparison with the traditional OLS-ML approach. Further, the IPO method exhibits larger average excess mean returns, therefore resulting in higher average Sharpe ratios. Additionally, we observe that the difference in realized portfolio variance across the two methods is greatest when the number of assets in the portfolio is largest $(n=100)$. Moreover, the difference in realized portfolio variance across the two methods is more significant for the CCC-GARCH models as opposed to the (perhaps) more well-specified DCC-GARCH models. These two observations are consistent with our prior claim that the benefit of the IPO framework is most notable as prediction model misspecification increases. In other words, the IPO framework appears to be more resilient to model misspecification.

Figure 4 compares the average out-of-sample portfolio variance in each experiment measured over 500 independently generated trials. As before, we report the DR: the proportion of trials for which the IPO method realizes an out-of-sample variance smaller than that of the OLS-ML method. We observe that the IPO method provides consistently lower realized portfolio variances, with DRs in the range of 58% to 100% and increasing in proportion to the number of assets in the portfolio. Further, the DRs are generally larger for the CCC-GARCH models in comparison with the corresponding DCC-GARCH models at each fixed universe size. When the universe size is smallest ($n=25$) the IPO method reports modest DRs of 79% and 58% for the CCC-GARCH and DCC-GARCH, respectively. In practice, however, portfolio managers typically construct portfolios from a much larger pool of assets ($n\ge 50$). It is under these situations that we observe the greatest benefit of the IPO approach. When $n=50$ we observe DRs of 95% and 77%, whereas when $n=100$ we observe DRs of 100% and 91% for the CCC-GARCH and DCC-GARCH models, respectively. Therefore, we posit that asset managers who construct MV portfolios from a large number of assets would likely benefit from the IPO framework.

Figure 6 charts the average 52-week rolling difference in realized volatility between the IPO and OLS-ML methods. The upper and lower shaded regions denote the 97.5 percentile and 2.5% outcomes, respectively. Note that volatility differences less than 0 imply that the IPO method realized a lower average volatility over that time period. As before, observe that in normal market conditions the difference in realized portfolio volatility across the two methods is small, with values oscillating around the zero line. We therefore observe negligible difference in realized portfolio volatility by applying the integrated parameter estimation approach during regular market conditions. In contrast, almost all the benefit of the IPO framework is observed during periods of market crisis. From the charts in Figure 6, we can clearly see three periods for which the IPO approach exhibits meaningful lower portfolio volatility in comparison with the traditional OLS-ML approach. These periods coincide with the three major US market recessions that occurred during the out-of-sample period: specifically, the dot-com bubble and market downturn of the early 2000s (from April 1999 to December 2002), the global financial crisis (from February 2007 to December 2009) and the global Covid-19 pandemic (from March 2020 to December 2020).

Global | |||

Dot-com | financial | Covid-19 | |

bubble | crisis | pandemic | |

Apr 1999 to | Feb 2007 to | Mar 2020 to | |

Experiment | Dec 2002 | Dec 2009 | Dec 2020 |

FF5-CCC ($n=\text{25}$) | $-$0.0113 | $-$0.0048 | $-$0.0200 |

FF5-DCC ($n=\text{25}$) | $-$0.0106 | 0.0111 | $-$0.0183 |

FF5-CCC ($n=\text{50}$) | $-$0.0140 | $-$0.0115 | $-$0.0330 |

FF5-DCC ($n=\text{50}$) | $-$0.0141 | 0.0062 | $-$0.0297 |

FF5-CCC ($n=\text{100}$) | $-$0.0144 | $-$0.0173 | $-$0.0430 |

FF5-DCC ($n=\text{100}$) | $-$0.0170 | 0.0130 | $-$0.0442 |

In Table 5 we provide the average difference in realized volatility over the three time periods of the US market recession. Again, values less than 0 imply that the IPO method realized a lower average volatility over that time period. In general, we observe that the relative reduction in volatility provided by the IPO method increases as the number of assets in the portfolio increases. Further, we observe that the IPO method exhibits a materially lower portfolio volatility across all experiments during both the dot-com bubble and the Covid-19 pandemic, with average annualized volatility reductions in the range of 1.06–1.70% and 1.83–4.42%, respectively. The results during the global financial crisis are mixed, with the IPO CCC-GARCH experiments exhibiting average volatility reduction in the range of 0.48–1.73%, whereas the IPO DCC-GARCH experiments actually exhibit larger realized volatility during that time period, with values ranging from 0.62% to 1.30%. Nonetheless, we find these larger-scale results to be highly encouraging for the IPO framework and it demonstrates that during most periods of market crisis, when undoubtedly investors are most concerned with portfolio risk, there is a high probability that covariance models that are integrated with downstream decision-based optimizations will result in lower realized portfolio variance.

We conclude this section with a note on computational complexity. In contrast to the traditional approach, estimating prediction model parameters in an integrated setting can be computationally expensive, in particular as the number of problem variables, $p$, becomes large. Specifically, the IPO framework, described in Section 3.4, requires solving at each iteration of gradient descent at most $m$ constrained quadratic programs, where $m$ is the total number of training observations. The time complexity therefore scales linearly with the number of training observations, $m$, and the total number of gradient descent iterations, $n$. Convex quadratic programs, however, are known to be solvable by interior-point methods in polynomial time, with worst-case time complexity $?({p}^{3})$ (Goldfarb and Liu 1991). Therefore, the worst-case time complexity for the IPO framework is $?(mn{p}^{3})$. In practice, however, quadratic programs are typically solved with much fewer iterations than their worst-case bound (Boyd and Vandenberghe 2004). In Figure 5 we compare the average runtime, in seconds, for the IPO and OLS-ML methods as a function of the number of assets, $p\in (10,25,50,100,200)$. We fix the number of training observations to $m=1000$ and set the maximum number of gradient descent iterations to $n=25$, and we acknowledge that the runtime will scale linearly as a function of these two parameters. We observe that for problems with a small number of assets (50 or less) the IPO runtime is competitive with that of the traditional OLS-ML approach. However, as the number of assets increase, the expected runtime of the IPO method increases considerably, with runtime values of 100 seconds and 800 seconds for $n=100$ and $n=200$, respectively. We note that the observed increase in runtime is much less than the worst-case time-complexity bound. However, when $n=200$ the expected runtime of the IPO method is a full order of magnitude larger than that of the OLS-ML method. Therefore, for assets managers who construct portfolios from a very large pool of assets, estimating prediction model parameters by IPO can be computationally burdensome. Improving the efficiency of the integrated framework is therefore an open problem and an interesting area of future research.

## 5 Conclusion and future work

In this paper we proposed an IPO framework for covariance model parameter estimation in the context of several risk-based portfolio optimizations. Specifically, we structured the problem as a stochastic program where, for a fixed instantiation of model parameters, we solved a series of deterministic nominal portfolio optimization programs. We investigated the IPO framework under four covariance model specifications and considered three nominal portfolio optimizations: MV, MD and ERC.

Covariance model parameters are optimized locally using the first-order method proposed by Butler and Kwon (2021), which restructures the IPO problem as a neural network with a differentiable convex programming layer. We provide the integrated formulation for the MV and MD portfolios, which are special cases of the general quadratic program originally presented by Agrawal et al (2019). The ERC portfolio is formulated as a convex optimization program, and the IPO formulation and relevant gradient equations are provided. We performed several historical simulations using US industry sectors and US stock data, and we compared the IPO framework against a traditional “predict, then optimize” framework, whereby parameters are optimized by OLS-ML. The numerical experiments for MV portfolios demonstrate the benefits of the IPO framework and provide a proof of concept that the integrated approach can result in lower out-of-sample realized nominal cost values. Further, we observe that prediction models that are more poorly specified exhibit the greatest relative improvement in realized nominal cost values and economic metrics, such as Sharpe ratios. This result is encouraging as it demonstrates the resilience of the IPO framework to model misspecification.

In the case of the MD and ERC portfolios, we observe that the IPO method provides a consistent reduction in realized out-of-sample costs. In both cases, however, the magnitude of the cost reduction is small and does not result in a material improvement in economic outcomes. We hypothesize that these observed results may be specific to our particular choice of data, or alternatively they may speak to the stability of the MD and ERC portfolio construction process. Further testing with alternative data sets, and under varying prediction model and portfolio constraint assumptions, is required in order to better determine the efficacy of the IPO approach under these portfolio decision settings.

Future work also includes incorporating other forms of prediction models into the IPO framework, as well as exploring methods for performing the more difficult joint prediction of asset returns and covariances. Applying the IPO under robust portfolio optimization is another interesting area of future research.

## Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

## References

- Agrawal, A., Amos, B., Barratt, S., Boyd, S., Diamond, S., and Kolter, J. Z. (2019). Differentiable convex optimization layers. Preprint (arXiv:1910.12430).
- Amos, B., and Kolter, J. Z. (2017). Optnet: differentiable optimization as a layer in neural networks. Preprint (arXiv:1703.00443).
- Bai, X., Scheinberg, K., and Tutuncu, R. (2016). Least-squares approach to risk parity in portfolio selection. Quantitative Finance 16(3), 357–376 (https://doi.org/10.1080/14697688.2015.1031815).
- Bauwens, L., Laurent, S., and Rombouts, J. (2003). Multivariate GARCH models: a survey. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.411062).
- Bertsimas, D., and Kallus, N. (2020). From predictive to prescriptive analytics. Management Science 66(3), 1025–1044 (https://doi.org/10.1287/mnsc.2018.3253).
- Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics 31(3), 307–327 (https://doi.org/10.1016/0304-4076(86)90063-1).
- Bollerslev, T. (1990). Modelling the coherence in short-run nominal exchange rates: a multivariate generalized arch model. Review of Economics and Statistics 72(3), 498–505 (https://doi.org/10.2307/2109358).
- Bollerslev, T., Engle, R. F., and Nelson, D. B. (1994). Arch models. In Handbook of Econometrics, Engle, R. F., and McFadden, D. (eds), Volume 4, Chapter 49, pp. 2959–3038. Elsevier.
- Bollerslev, T., Engle, R. F., and Wooldridge, J. M. (1988). A capital asset pricing model with time-varying covariances. Journal of Political Economy 96(1), 116–131 (https://doi.org/10.1086/261527).
- Boyd, S., and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press (https://doi.org/10.1017/CBO9780511804441).
- Butler, A., and Kwon, R. H. (2021). Integrating prediction in mean–variance portfolio optimization. Preprint (arXiv:2102.09287).
- Choueifaty, Y., and Coignard, Y. (2008). Toward maximum diversification. Journal of Portfolio Management 35(1), 40–51 (https://doi.org/10.3905/JPM.2008.35.1.40).
- Choueifaty, Y., Froidure, T., and Reynier, J. (2013). Properties of the most diversified portfolio. The Journal of Investment Strategies 2(2), 49–70 (https://doi.org/10.21314/JOIS.2013.033).
- Clarke, R. G., de Silva, H., and Murdock, R. (2005). A factor approach to asset allocation. Journal of Portfolio Management 32(1), 10–21 (https://doi.org/10.3905/jpm.2005.599487).
- Cornuejols, G., and Tutuncu, R. (2006). Optimization Methods in Finance. Mathematics, Finance and Risk. Cambridge University Press (https://doi.org/10.1017/CBO9780511753886).
- Costa, G., and Kwon, R. (2020). A robust framework for risk parity portfolios. Journal of Asset Management 21 (https://doi.org/10.1057/s41260-020-00179-w).
- De Nard, G., Ledoit, O., and Wolf, M. (2019). Factor models for portfolio selection in large dimensions: the good, the better and the ugly. Journal of Financial Econometrics 19(2), 236–257 (https://doi.org/10.1093/jjfinec/nby033).
- Donti, P. L., Amos, B., and Kolter, J. Z. (2017). Task-based end-to-end model learning in stochastic optimization. Preprint (arXiv:1703.04529).
- Elmachtoub, A. N., and Grigas, P. (2020). Smart “predict, then optimize”. Preprint (arXiv:1710.08005).
- Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(1), 987–1008 (https://doi.org/10.2307/1912773).
- Engle, R. F. (2002). Dynamic conditional correlation: a simple class of multivariate generalized autoregressive conditional heteroskedasticity models. Journal of Business and Economic Statistics 20(3), 339–350 (https://doi.org/10.1198/073500102288618487).
- Engle, R. F., and Colacito, R. (2006). Testing and valuing dynamic correlations for asset allocation. Journal of Business and Economic Statistics 24(2), 238–253 (https://doi.org/10.1198/073500106000000017).
- Engle, R. F., Ledoit, O., and Wolf, M. (2019). Large dynamic covariance matrices. Journal of Business and Economic Statistics 37(2), 363–375 (https://doi.org/10.1080/07350015.2017.1345683).
- Engle, R. F., Ng, V. K., and Rothschild, M. (1990). Asset pricing with a factor-arch covariance structure: empirical estimates for treasury bills. Journal of Econometrics 45(1), 213–237 (https://doi.org/10.1016/0304-4076(90)90099-F).
- Fama, E. F., and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33(3), 3–56 (https://doi.org/10.1016/0304-405X(93)90023-5).
- Fama, E. F., and French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics 116(1), 1–22 (https://doi.org/10.1016/j.jfineco.2014.10.010).
- Fan, J., Fan, Y., and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. Journal of Econometrics 147(1), 186–197 (https://doi.org/10.1016/j.jeconom.2008.09.017).
- Goldfarb, D., and Iyengar, G. (2003). Robust portfolio selection problems. Mathematics of Operations Research 28(1), 1–38 (https://doi.org/10.1287/moor.28.1.1.14260).
- Goldfarb, D., and Liu, S. (1991). An o(n3l) primal interior point algorithm for convex quadratic programming. Mathematical Programming 49, 325–340 (https://doi.org/10.1007/BF01588795).
- Hoffstein, C., Faber, N., and Braun, S. (2020). Rebalance timing luck: the (dumb) luck of smart beta. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.3673910).
- Ledoit, O., and Wolf, M. (2017). Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets Goldilocks. Review of Financial Studies 30(12), 4349–4388 (https://doi.org/10.1093/rfs/hhx052).
- Lien, D., Tse, Y., and Tsui, A. (2002). Evaluating the hedging performance of the constant-correlation GARCH model. Applied Financial Economics 12, 791–98 (https://doi.org/10.1080/09603100110046045).
- Maillard, S., Roncalli, T., and Teïletche, J. (2010). On the properties of equally weighted risk contributions portfolios. Journal of Portfolio Management 36(4), 60–70 (https://doi.org/10.3905/jpm.2010.36.4.060).
- Markowitz, H. (1952). Portfolio selection. Journal of Finance 7(1), 77–91 (https://doi.org/10.1111/j.1540-6261.1952.tb01525.x).
- Pakel, C., Engle, R. F., Shephard, K. K., and Shephard, N. (2019). Fitting vast dimensional time-varying covariance models. Journal of Business and Economic Statistics (https://doi.org/10.1080/07350015.2020.1713795).
- Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2009). Lectures on Stochastic Programming. MOS–SIAM Series on Optimization. SIAM, Philadelphia, PA (https://doi.org/10.1137/1.9780898718751).
- Spinu, F. (2013). An algorithm for computing risk parity weights. Working Paper, Social Science Research Network (https://doi.org/10.2139/ssrn.2297383).
- Tse, Y. (2000). A test for constant correlations in a multivariate GARCH model. Journal of Econometrics 98(1), 107–127 (https://doi.org/10.1016/S0304-4076(99)00080-9).
- Varga-Haszonits, I., and Kondor, I. (2007). Noise sensitivity of portfolio selection in constant conditional correlation GARCH models. Physica A 385(1), 307–318 (https://doi.org/10.1016/j.physa.2007.06.017).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact info@risk.net or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to print this content. Please contact info@risk.net to find out more.

You are currently unable to copy this content. Please contact info@risk.net to find out more.

Copyright Infopro Digital Limited. All rights reserved.

As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (point 2.4), printing is limited to a single copy.

If you would like to purchase additional rights please email info@risk.net

Copyright Infopro Digital Limited. All rights reserved.

You may share this content using our article tools. As outlined in our terms and conditions, https://www.infopro-digital.com/terms-and-conditions/subscriptions/ (clause 2.4), an Authorised User may only make one copy of the materials for their own personal use. You must also comply with the restrictions in clause 2.5.

If you would like to purchase additional rights please email info@risk.net