# Skewed target range strategy for multiperiod portfolio optimization using a two-stage least squares Monte Carlo method

## Rongju Zhang, Nicolas Langrené, Yu Tian, Zili Zhu, Fima Klebaner and Kais Hamza

#### Need to know

• A novel investment strategy that maximizes the expected portfolio value bounded within a target range.
• This strategy achieves a similar efficient frontier, a better downside-return trade-off, compared to the CRRA utility.
• A two-stage regression method that improves the least squares Monte Carlo algorithm.

#### Abstract

In this paper, we propose a novel investment strategy for portfolio optimization problems. The proposed strategy maximizes the expected portfolio value bounded within a targeted range, composed of a conservative lower target representing a need for capital protection and a desired upper target representing an investment goal. This strategy favorably shapes the entire probability distribution of returns, as it simultaneously seeks a desired expected return, cuts off downside risk and implicitly caps volatility and higher moments. To illustrate the effectiveness of this investment strategy, we study a multiperiod portfolio optimization problem with transaction costs and develop a two-stage regression approach that improves the classical least squares Monte Carlo (LSMC) algorithm when dealing with difficult payoffs, such as highly concave, abruptly changing or discontinuous functions. Our numerical results show substantial improvements over the classical LSMC algorithm for both the constant relative risk-aversion (CRRA) utility approach and the proposed skewed target range strategy (STRS). Our numerical results illustrate the ability of the STRS to contain the portfolio value within the targeted range. When compared with the CRRA utility approach, the STRS achieves a similar mean–variance efficient frontier while delivering a better downside risk–return trade-off.

## 1 Introduction

A crucial and long-standing problem in the theory and practice of portfolio optimization is the choice of an effective and transparent performance criterion that balances risk and return. In this paper, we propose a novel portfolio optimization criterion that aims to combine the respective strengths of the classical criteria considered in the literature.

The origin of the literature corresponds to the notion of decision making under uncertainty. From there, von Neumann and Morgenstern (1944) proposed the expected utility approach, in which investment preferences are captured by a utility function. The shortcomings of this approach include the abstract nature of utility functions, which can make them impractical, and its omission of several practical aspects of actual decision making, as identified by the cumulative prospect theory of Tversky and Kahneman (1992): see, for example, Barberis (2012).

The mean–variance framework of Markowitz (1952), which uses variance to measure risk, can reliably approximate the quadratic utility case. When asset returns are assumed to be normally distributed, many other risk measures have been found that are equivalent to variance (eg, equivalence to the first- and second-order lower partial moments has been proved by Klebaner et al (2017)), but the mean–variance framework greatly benefits from its simple quadratic formulation.

Some may argue that variance is an inadequate measure of portfolio risk, as asset returns usually exhibit the so-called leptokurtic property, meaning that higher moments may need to be incorporated into the optimization. We refer to Lai (1991) and Konno et al (1993) for the skewness component and to Davis and Norman (1990) for both skewness and kurtosis. Another approach to address the issue of nonnormality of asset returns is to use a downside risk measure. The most common downside risk measures are the lower-partial moments (eg, semivariance introduced in Markowitz (1959)), value-at-risk (VaR; Longerstaey (1996)) and conditional value-at-risk (CVaR; Rockafellar and Uryasev (2000), also known as expected shortfall). These measures can replace variance to form a mean–downside risk approach: see Harlow (1991) for a mean–lower-partial moment framework, Alexander and Baptista (2002) for the mean–VaR framework and Agarwal and Naik (2004) for the mean–CVaR framework.

The last main strand of the literature corresponds to target-based strategies that aim to track a prespecified investment target. A popular target-based strategy is to maximize the probability of achieving a return target (see Browne (1999a) for a fixed absolute target and Browne (1999b), Pham (2003), Gaivoronski et al (2005) and Morton et al (2006) for relative benchmark targets). Alternatively, one can minimize the probability of an undesirable outcome (see, for example, Hata et al 2010; Nagai 2012; Milevsky et al 2006). Using an explicitly specified investment target in portfolio optimization makes it easier to understand and monitor in practice. However, choosing a suitable investment target that properly balances risk and return remains a challenging task.

Building upon these classical investment criteria, we propose the so-called skewed target range strategy (STRS). This maximizes the expected portfolio value bounded within a prespecified target range, which is composed of a conservative lower target representing a need for capital protection and a desired upper target corresponding to an ideal return level that the investor wishes to achieve. Implicitly, the optimization can be described as maximizing the probability that the realized return lies within the targeted range and as close to the upper target as possible.

There are three main motivations behind the proposed STRS. The first motivation traces back to the primary purpose of an investment objective function, which is to carve a desirable shape for the probability distribution of returns. The STRS, seeking a desirable expected return while chopping off most of the tails of the distribution beyond the targeted range, restrains the entire return distribution. The second motivation comes from the difficulty of specifying a single return target for classical target-based strategies, which cannot simultaneously serve the pursuit of a desired investment target and downside protection. The STRS solves this dilemma by using an upper target that accounts for return-seeking preference combined with a lower target that accounts for loss-aversion preference. Finally, performance criteria such as utility functions depending on abstract parameters with unforeseeable practical effects are unlikely to be adopted by investors. Our proposition of two explicit targets labeled in terms of returns, with intuitive purposes (capital protection for the lower target and desired investment return for the upper target), serves as a more practical investment criterion.

To test the effectiveness of the proposed STRS (formulated in Section 2), we study a multiperiod portfolio optimization problem with proportional transaction costs. To do so, we modify the classical least squares Monte Carlo (LSMC) algorithm to use a two-stage regression technique; this makes the problem of approximating the abrupt STRS objective function (2.1) as easy as approximating a linear function. The LSMC literature and the details of the proposed two-stage LSMC method are further discussed in Section 3. We show that this two-stage LSMC method is numerically more stable than the classical LSMC method for both the smooth constant relative risk-aversion (CRRA) utility approach and the abrupt STRS. We find that an appropriate level for the lower target is the initial portfolio value, as it marginally minimizes the standard deviation and the downside risk of the terminal portfolio value. Importantly, we show that the STRS criterion behaves as expected from its design: the portfolio value is well targeted within the specified range, and the downside risk is robust with respect to the choice of the upper target. We numerically show that the STRS achieves a similar mean–variance efficient frontier while delivering a better downside risk–return trade-off when compared with the CRRA utility optimization approach. We also provide two simple extensions of the STRS, described in Section 4. The first extension, dubbed the flat target range strategy (FTRS), corresponds to the pure probability maximization of achieving a targeted range, without a further attempt to pursue a higher return. The FTRS is useful for problems where maintaining solvency is more important than seeking high returns, for example, for long-term pension schemes, retirement funds and life-cycle management. The second extension, dubbed the relative target range strategy (RTRS), focuses on relative returns: it involves a return target range defined in terms of excess return over a stochastic benchmark, such as a stock market index, an interest rate or an inflation rate. All the numerical results are presented in Section 5.

## 2 Skewed target range strategy

In this section, we define the STRS for portfolio optimization problems and discuss the potential benefits of this strategy. We consider a portfolio optimization problem with $d$ risky assets available over a finite time horizon $T$. Let $\smash{\bm{\alpha}_{t}=\{\alpha_{t}^{i}\}_{1\leq i\leq d}}$ be the portfolio weight in each risky asset at time $t$, and denote by $W_{t}$ the portfolio value (or wealth). Assume that the investor aims to maximize the expectation of some function of the terminal portfolio value $\mathbb{E}[f(W_{T})]$. Then, the objective function simply reads

 $\sup_{\bm{\alpha}}\mathbb{E}[f(W_{T})],$ (2.1)

where the investment preference is characterized by the function $f(\cdot)$. In this paper, we propose the following parametric shape:

 $f(w)=(w-L_{W})\bm{1}\{L_{W}\leq w\leq U_{W}\},$ (2.2)

where $L_{W}\in\mathbb{R}$ represents a conservative lower target, $U_{W}\in\mathbb{R}$ represents a desired upper target, and the indicator function $\bm{1}\{L_{W}\leq w\leq U_{W}\}$ returns $1$ if $L_{W}\leq w\leq U_{W}$ and $0$ otherwise. We refer to the shape (2.2) and the corresponding objective (2.1) as the STRS. Throughout this paper, we normalize the portfolio value $W$ and the bounds $[L_{W},U_{W}]$ by the initial portfolio value $W_{0}$. Indeed, (2.2) shows that

 $f(w;L_{W},U_{W})=W_{0}\times f\bigg{(}\frac{w}{W_{0}};\frac{L_{W}}{W_{0}},% \frac{U_{W}}{W_{0}}\bigg{)},$

so we can assume without loss of generality that $W_{0}=1$ and set the bounds $L_{W}$ and $U_{W}$ in the vicinity of $1$. Figure 1 shows an example of (2.2) with $L_{W}=1.0$ and $U_{W}=1.2$.

From (2.2), one can see that the objective is to maximize the expected terminal portfolio value within the interval $[L_{W},U_{W}]$, while the values outside this interval are penalized down to zero. This strategy implicitly combines two objectives: maximizing the expected terminal portfolio value and maximizing the probability that the terminal portfolio value lies within the chosen target range $[L_{W},U_{W}]$.

On the left-hand side of the skewed shape in (2.2), the function is convex at the lower target $L_{W}$. This is consistent with the cumulative prospect theory of Tversky and Kahneman (1992), which states that investors tend to be risk-seeking when losing money. By contrast, on the right-hand side of the skewed shape, the function is discontinuous and jumps down to zero at the upper target $U_{W}$. This is a distinctive feature of the STRS compared with classical utility functions as well as cumulative prospect theory. In particular, the foregoing of the upside potential beyond the upper target $U_{W}$ seems to conflict with the nonsatiation axiom that people prefer more to less. The following explains the importance of this upper threshold.

All else being equal (ceteris paribus assumption), one would expect people to prefer more to less. This axiom in the context of dynamic stochastic portfolio optimization can be interpreted as follows: the downside risk being fixed (the left tail of the return distribution), investors would prefer higher upside potential (a longer right tail of the return distribution). However, after extensive numerical experiments, we came to the conclusion that nondecreasing utility functions are unable to decouple upside potential from downside risk. Indeed, pursuing higher upside potential leads to riskier portfolio decisions, which may result in a return distribution with a large right tail (gains) as well as a large left tail (losses). As the ceteris paribus assumption does not apply in this stochastic context, one cannot rule out the existence of a satiation level. Such a level is determined by the investor’s preference with respect to risk and return.

As upside potential and downside risk are naturally intertwined, the proposed upper target is able to curtail downside risk by addressing its main cause: namely, the pursuit of excessive upside potential. As a result, the realized returns can be well contained within the targeted range with a high degree of confidence, which in several contexts is more important than allowing for the possibility of rare windfall returns at the cost of higher downside risk.

## 3 Multiperiod portfolio optimization

In this section, we consider a multiperiod portfolio optimization problem and formulate it as a discrete-time dynamic programming problem, for which we develop a two-stage LSMC method to solve it. The LSMC algorithm, originally developed by Carriere (1996), Longstaff and Schwartz (2001) and Tsitsiklis and Van Roy (2001) for pricing American options, has been extended to solve dynamic portfolio optimization problems by several researchers. Brandt et al (2005) consider a CRRA utility function and determine a semi-closed-form solution by solving the first-order condition of the Taylor series expansion of the value function. Cong and Oosterlee (2016a, b) consider a target-based mean–variance objective function and use a suboptimal strategy to perform the forward simulation of control variables, which are iteratively updated in the backward recursive programming. Later, Cong and Oosterlee (2017) combine the stochastic bundling technique of Jain and Oosterlee (2015) with the method of Brandt et al (2005). Zhang et al (2019) consider a CRRA utility function and adopt the control randomization technique of Kharroubi et al (2014) for a portfolio optimization problem with switching costs including transaction costs, liquidity costs and market impact.

The aforementioned works solve problems with a continuous payoff function, for which the classical LSMC method can be very effective. By contrast, highly nonlinear, abruptly changing or discontinuous payoffs can be more difficult to handle for the LSMC algorithm (Zhang et al 2019; Balata and Palczewski 2018; Andreasson and Shevchenko 2018). The STRS (2.2), with its abrupt drop at the upper bound $U_{W}$, is such a difficult function. In addition, as the terminal wealth outside the targeted range is truncated to zero in the value function, a direct regression on these zeros would forego the original information from the wealth variable. In this section, we propose a two-stage LSMC method to overcome these issues.

### 3.1 Dynamic programming

Denote by $\smash{R^{f}}$ the cumulative return of the risk-free asset over a single period. Denote by $\smash{\bm{R}_{t}=\{R_{t}^{i}\}_{1\leq i\leq d}}$ the excess returns of the risky assets over the risk-free rate, and denote by $\bm{Z}_{t}$ the vector of return predictors. The optimization problem in (2.1) can be formulated as a stochastic control problem with exogenous state variables $\bm{Z}_{t}$ and one endogenous state variable $W_{t}$. Let $\mathcal{A}\subseteq\mathbb{R}^{d}$ be the set of admissible portfolio weights. The value function in (2.1) can now be rewritten as

 $v_{t}(z,w):=\sup_{\{\bm{\alpha}_{\tau}\in\mathcal{A}\}_{t\leq\tau\leq T}}% \mathbb{E}[f(W_{T})\mid\bm{Z}_{t}=z,\ W_{t}=w].$ (3.1)

Consider an equidistant discretization of the investment horizon $[0,T]$, denoted by $0=t_{0}<\dots. The wealth process evolves as

 $W_{t_{n+1}}=W_{t_{n}}(R^{f}+\bm{\alpha}_{t_{n}}\bm{R}_{t_{n+1}}),$ (3.2)

and the value function satisfies the following dynamic programming principle:

 $\displaystyle v_{t_{N}}(z,w)$ $\displaystyle=f(w),$ $\displaystyle v_{t_{n}}(z,w)$ $\displaystyle=\sup_{\bm{\alpha}_{t_{n}}\in\mathcal{A}}\mathbb{E}[v_{t_{n+1}}(% \bm{Z}_{t_{n+1}},W_{t_{n+1}})\mid\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w],$ (3.3)

where $f(w)=(w-L_{W})\bm{1}\{L_{W}\leq w\leq U_{W}\}$.

### 3.2 Classical least squares Monte Carlo

The first part of the LSMC algorithm is the forward simulation of all the stochastic state variables. Let $M$ denote the number of Monte Carlo simulations. The return predictors $\{\bm{Z}_{t_{n}}^{m}\}_{0\leq n\leq N}^{1\leq m\leq M}$ and the asset excess returns $\{\bm{R}_{t_{n}}^{m}\}_{0\leq n\leq N}^{1\leq m\leq M}$ are generated through some predetermined return dynamics. By contrast, the wealth process is an endogenous state variable depending on the realization of the portfolio weights. We follow the control randomization approach of Kharroubi et al (2014): we randomly generate uniform portfolio weights $\{\tilde{\bm{\alpha}}_{t_{n}}^{m}\}_{0\leq n\leq N}^{1\leq m\leq M}$ and then compute the corresponding portfolio values $\{\tilde{W}_{t_{n}}^{m}\}_{0\leq n\leq N}^{1\leq m\leq M}$ according to (3.2).

The second part of the LSMC algorithm uses a discretization procedure. We discretize the control space as $\mathcal{A}^{\mathrm{d}}=\{\bm{a}_{1},\dots,\bm{a}_{J}\}$. We define the continuation value function $\mathrm{CV}_{t_{n}}^{j}$ as the expectation of the subsequent value function conditional on making the decision $\bm{\alpha}_{t_{n}}=\bm{a}_{j}\in\mathcal{A}^{\mathrm{d}}$, ie,

 $\mathrm{CV}_{t_{n}}^{j}(z,w):=\mathbb{E}[v_{t_{n+1}}(\bm{Z}_{t_{n+1}},W_{t_{n+% 1}})\mid\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w,\ \bm{\alpha}_{t_{n}}=\bm{a}_{j}].$ (3.4)

Therefore, the value function can be approximated by

 $\displaystyle v_{t_{n}}(z,w)$ $\displaystyle=\sup_{\bm{\alpha}_{t_{n}}\in\mathcal{A}}\mathbb{E}[v_{t_{n+1}}(% \bm{Z}_{t_{n+1}},W_{t_{n+1}})\mid\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w]$ $\displaystyle\approx\max_{\bm{a}_{j}\in\mathcal{A}^{\mathrm{d}}}\mathrm{CV}_{t% _{n}}^{j}(z,w).$

To compute this value function, we proceed by backward dynamic programming. At time $t_{N}$, the value function is equal to $\smash{\hat{v}_{t_{N}}(z,w)=(w-L_{W})\bm{1}\{L_{W}\leq w\leq U_{W}\}}$. At time $t_{n}$, assume that the continuation value functions $\smash{\{\hat{\mathrm{CV}}_{t_{n^{\prime}}}^{j}(z,w)\}_{n+1\leq n^{\prime}\leq N% -1}^{1\leq j\leq J}}$ have been estimated. We evaluate the continuation value function at the current time $\smash{\mathrm{CV}_{t_{n}}^{j}}$ for each decision $\smash{\bm{a}_{j}\in\mathcal{A}^{\mathrm{d}}}$. We then reset the portfolio weights $\smash{\{\bm{\alpha}_{t_{n}}^{m}\}_{1\leq m\leq M}}$ to $\smash{\bm{a}_{j}}$ and recompute the endogenous wealth from $\smash{t_{n}}$ to $\smash{t_{N}}$:

 $\displaystyle\hat{W}_{t_{n+1}}^{m,(n,j)}$ $\displaystyle=\tilde{W}_{t_{n}}^{m}(R^{f}+\bm{a}_{j}\bm{R}_{t_{n+1}}^{m}),$ $\displaystyle\hat{W}_{t_{n+2}}^{m,(n,j)}$ $\displaystyle=\hat{W}_{t_{n+1}}^{m,(n,j)}(R^{f}+\arg\max_{\bm{a}_{l}\in% \mathcal{A}^{\mathrm{d}}}\{\hat{\mathrm{CV}}_{t_{n+1}}^{l}(\bm{Z}_{t_{n+1}}^{m% },\hat{W}_{t_{n+1}}^{m,(n,j)})\}\bm{R}_{t_{n+2}}^{m}),$ $\displaystyle\vdots$ $\displaystyle\hat{W}_{t_{N}}^{m,(n,j)}$ $\displaystyle=\hat{W}_{t_{N-1}}^{m,(n,j)}(R^{f}+\arg\max_{\bm{a}_{l}\in% \mathcal{A}^{\mathrm{d}}}\{\hat{\mathrm{CV}}_{t_{N-1}}^{l}(\bm{Z}_{t_{N-1}}^{m% },\hat{W}_{t_{N-1}}^{m,(n,j)})\}\bm{R}_{t_{N}}^{m}),$ (3.5)

where

 $\hat{W}_{t_{n^{\prime}}}^{m,(n,j)}:=\hat{W}_{t_{n^{\prime}}}^{m}|_{W_{t_{n}}^{% m}=\tilde{W}_{t_{n}}^{m},\bm{\alpha}_{t_{n}}=\bm{a}_{j}},\quad n^{\prime}=n,% \dots,N,$

is the recomputed wealth from $t_{n}$ to $t_{N}$, using the portfolio weights $\bm{a}_{j}$ at time $t_{n}$ and the estimated optimal portfolio weights at times $t_{n+1},\dots,t_{N-1}$.

To approximate the continuation value function $\mathrm{CV}_{t_{n}}^{j}(z,w)$, the classical LSMC algorithm regresses the payoffs $\smash{\{f(\hat{W}_{t_{N}}^{m,(n,j)})\}_{1\leq m\leq M}}$ on $\smash{\{\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m})\}_{1\leq m\leq M}^% {1\leq k\leq K}}$, where $\smash{\{\psi_{k}(z,w)\}_{1\leq k\leq K}}$ is the vector of basis functions of the state variables. However, the major difficulty here lies in the abrupt upper bound $U_{W}$, which can cause large numerical errors in the regression according to our numerical exploration.

As $f$ censors the values of $\smash{\hat{W}_{t_{N}}^{m,(n,j)}}$ outside the targeted range $[L_{W},U_{W}]$, our regression problem looks similar to a censored regression problem, for which a common estimation approach is maximum likelihood estimation (MLE). However, the main difference between our problem and a censored regression problem is that we have access to both the censored samples $\smash{\{f(\hat{W}_{t_{N}}^{m,(n,j)})\}_{1\leq m\leq M}}$ and the uncensored samples $\smash{\{\hat{W}_{t_{N}}^{m,(n,j)}\}_{1\leq m\leq M}}$. Thus, MLE would ignore the information of the uncensored values $\smash{\hat{W}_{t_{N}}^{m,(n,j)}}$ that are also observable in this estimation problem. The availability of this extra piece of information motivates us to propose a two-stage regression that takes advantage of this information. We now describe this technique in detail.

### 3.3 Two-stage least squares Monte Carlo

This two-stage regression works as follows.

1. (1)

Instead of regressing the payoffs $\smash{\{f(\hat{W}_{t_{N}}^{m,(n,j)})\}_{1\leq m\leq M}}$, we regress the wealth $\smash{\{\hat{W}_{t_{N}}^{m,(n,j)}\}_{1\leq m\leq M}}$ on $\smash{\{\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m})\}_{1\leq m\leq M}^% {1\leq k\leq K}}$ to obtain

 $\displaystyle\Big{\{}\hat{\beta}_{k,t_{n}}^{j}\Big{\}}_{1\leq k\leq K}$ $\displaystyle={\arg\min_{\beta\in\mathbb{R}^{K}}}\sum_{m=1}^{M}\bigg{(}\sum_{k% =1}^{K}\beta_{k}\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m})-\hat{W}_{t_% {N}}^{m,(n,j)}\bigg{)}^{\!2},$ $\displaystyle\hat{\sigma}_{t_{n}}^{j}$ $\displaystyle=\sqrt{\frac{1}{M-K}\sum_{m=1}^{M}\bigg{(}\hat{W}_{t_{N}}^{m,(n,j% )}-\sum_{k=1}^{K}\hat{\beta}_{k,t_{n}}^{j}\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W% }_{t_{n}}^{m})\bigg{)}^{\!2}}.$ (3.6)

As a result, the terminal wealth can be modeled as

 $\hat{W}_{t_{N}}^{(n,j)}=\hat{\mu}_{t_{n}}^{j}(z,w)+\hat{\sigma}_{t_{n}}^{j}% \varepsilon,\qquad\hat{\mu}_{t_{n}}^{j}(z,w):=\sum_{k=1}^{K}\hat{\beta}_{k,t_{% n}}^{j}\psi_{k}(z,w),$ (3.7)

where $\varepsilon$ is the regression residual, which for demonstrative purposes we assume to be Gaussian. (Note that an assumption for the distribution of the residuals is also required by MLE.) Let

 $\phi(x)=\frac{1}{\sqrt{2\pi}}\exp\bigg{(}\frac{x^{2}}{2}\bigg{)}$

represent the standard normal probability density function and let

 $\varPhi(x)=\int_{-\infty}^{x}\phi(x)\,\mathrm{d}x$

represent the standard normal cumulative distribution function.

2. (2)

Plug (3.7) into the continuation value formula (3.4) to obtain a closed-form estimate. By combining (3.4)–(3.7), we obtain the following closed-form estimate of the continuation value function for each $\bm{a}_{j}\in\mathcal{A}^{\mathrm{d}}$ at time $t_{n}$:

 $\displaystyle\hat{\mathrm{CV}}_{t_{n}}^{j}(z,w)$ $\displaystyle=\mathbb{E}[(W_{t_{N}}-L_{W})\bm{1}\{L_{W}\leq W_{t_{N}}\leq U_{W% }\}\mid\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w,\ \bm{\alpha}_{t_{n}}=\bm{a}_{j}]$ $\displaystyle=\mathbb{E}_{\varepsilon}[(\hat{\mu}_{t_{n}}^{j}(z,w)+\hat{\sigma% }_{t_{n}}^{j}\varepsilon-L_{W})\times\bm{1}\{L_{W}\leq\hat{\mu}_{t_{n}}^{j}(z,% w)+\hat{\sigma}_{t_{n}}^{j}\varepsilon\leq U_{W}\}]$ $\displaystyle=(\hat{\mu}_{t_{n}}^{j}(z,w)-L_{W})\mathbb{E}_{\varepsilon}\bigg{% [}\bm{1}\bigg{\{}\frac{L_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^% {j}}\leq\varepsilon\leq\frac{U_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t% _{n}}^{j}}\bigg{\}}\bigg{]}$ $\displaystyle\qquad+\hat{\sigma}_{t_{n}}^{j}\mathbb{E}_{\varepsilon}\bigg{[}% \varepsilon\bm{1}\bigg{\{}\frac{L_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}% _{t_{n}}^{j}}\leq\varepsilon\leq\frac{U_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{% \sigma}_{t_{n}}^{j}}\bigg{\}}\bigg{]}$ $\displaystyle=(\hat{\mu}_{t_{n}}^{j}(z,w)-L_{W})\bigg{(}\varPhi\bigg{(}\frac{U% _{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg{)}-\varPhi% \bigg{(}\frac{L_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg% {)}\bigg{)}$ $\displaystyle\qquad-\hat{\sigma}_{t_{n}}^{j}\bigg{(}\phi\bigg{(}\frac{U_{W}-% \hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg{)}-\phi\bigg{(}% \frac{L_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg{)}\bigg% {)},$ (3.8)

where the last equality is obtained by direct integration.

3. (3)

The mappings $\hat{\bm{\alpha}}_{t_{n}}\colon(z,w)\mapsto\hat{\bm{\alpha}}_{t_{n}}(z,w)$ and $\hat{v}_{t_{n}}\colon(z,w)\mapsto\hat{v}_{t_{n}}(z,w)$ are estimated by

 $\hat{\bm{\alpha}}_{t_{n}}(z,w)=\operatorname*{arg~{}max}_{\bm{a}_{j}\in% \mathcal{A}^{\mathrm{d}}}\hat{\mathrm{CV}}_{t_{n}}^{j}(z,w)\quad\text{and}% \quad\hat{v}_{t_{n}}(z,w)=\max_{\bm{a}_{j}\in\mathcal{A}^{\mathrm{d}}}\hat{% \mathrm{CV}}_{t_{n}}^{j}(z,w).$ (3.9)

In summary, thanks to the censored linear shape of the skewed target range function in (2.2), the conditional expectations in the dynamic programming equations (3.3) can be estimated by the closed-form formula (3.8). Due to the linearity of the regressand $\smash{\hat{W}_{t_{N}}^{m,(n,j)}}$ in (3.6), this two-stage regression is much more robust and stable than a direct regression of $\smash{f(\hat{W}_{t_{N}}^{m,(n,j)})}$. Section 4.1 describes a similar closed-form conditional value for the CRRA utility approach, and Section 5.3 illustrates the numerical improvements provided by this two-stage LSMC method.

More generally, the approach proposed here (the linear approximation in (3.7) $+$ the decensored corrections in (3.8)) can be adapted to situations where residuals are non-Gaussian: this would simply modify the correction terms in (3.8). There is no restriction on the choice of the residual distribution or on the estimation methods (empirical distribution, kernel estimation, mixture normal, etc). Nevertheless, without loss of generality, it is reasonable to assume normality of residuals for low-frequency trading such as monthly returns, with monthly rebalancing considered in our numerical experiments in Section 5. In addition, the properties of the wealth distribution can be accurately captured by regressing $\{\hat{W}_{t_{N}}^{m,(n,j)}\}_{1\leq m\leq M}$ on basis functions of $\{\tilde{W}_{t_{n}}^{m}\}_{1\leq m\leq M}$, yielding regression residuals close to normal. Based on our numerical experiments, the residuals are indeed very close to normal. For these reasons and for demonstration purposes, we henceforth assume normality of residuals and focus on analyzing the effects of the new investment objective (2.2).

### 3.4 State-dependent standard deviation

An important assumption made in the previous subsection is that $\smash{\hat{\sigma}_{t_{n}}^{j}}$ only depends on the portfolio decision $\smash{\bm{a}^{j}}$ and not on the state variables $(\bm{Z}_{t_{n}},W_{t_{n}})$. This subsection describes how to improve the standard deviation estimate to incorporate state variables. Similar to the approximation of $\smash{\hat{\mu}_{t_{n}}^{j}(z,w)}$, the state-dependent standard deviation $\smash{\hat{\sigma}_{t_{n}}^{j}(z,w)}$ can be approximated by the exponential of a linear combination of basis functions of state variables,

 $\hat{\sigma}_{t_{n}}^{j}(z,w)=\exp\bigg{(}\sum_{k=1}^{K^{\prime}}\hat{\eta}_{k% ,t_{n}}^{j}\psi_{k}(z,w)\bigg{)}.$

The purpose of the exponential transform is to avoid the possibility of negative standard deviation estimates. Then, the two-stage regression becomes

 $\displaystyle\hat{W}_{t_{N}}^{(n,j)}=\hat{\mu}_{t_{n}}^{j}(z,w)+\varepsilon,$ $\displaystyle\varepsilon\sim\mathcal{N}(0,\hat{\sigma}_{t_{n}}^{j}(z,w)),$ $\displaystyle\hat{\mu}_{t_{n}}^{j}(z,w)=\sum_{k=1}^{K}\hat{\beta}_{k,t_{n}}^{j% }\psi_{k}(z,w),$ $\displaystyle\hat{\sigma}_{t_{n}}^{j}(z,w)=\exp\bigg{(}\sum_{k=1}^{K^{\prime}}% \hat{\eta}_{k,t_{n}}^{j}\psi_{k}(z,w)\bigg{)}.$

Note that a standard least squares regression cannot be used to estimate an unobservable variable such as standard deviation. Instead, we use MLE. We first perform a least squares regression to approximate the mean $\smash{\hat{\mu}_{t_{n}}^{j}(z,w)}$ and then approximate the logarithmic standard deviation $\smash{\log\hat{\sigma}_{t_{n}}^{j}(z,w)}$ by maximizing the following loglikelihood function:

 $\displaystyle\mathcal{L}(\eta\mid\bm{Z}_{t_{n}},\tilde{W}_{t_{n}},\hat{W}_{t_{% N}}^{(n,j)})$ $\displaystyle\qquad{}=\sum_{m=1}^{M}\bigg{\{}{-}\sum_{k=1}^{K^{\prime}}\eta_{k% ,t_{n}}^{j}\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m})-\frac{(\hat{% \varepsilon}^{m})^{2}}{2}\exp\bigg{(}{-}2\sum_{k=1}^{K^{\prime}}\eta_{k,t_{n}}% ^{j}\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m})\bigg{)}\bigg{\}},$

where

 $\hat{\varepsilon}^{m}=\hat{W}_{t_{N}}^{m,(n,j)}-\sum_{k=1}^{K}\hat{\beta}_{k,t% _{n}}^{j}\psi_{k}(\bm{Z}_{t_{n}}^{m},\tilde{W}_{t_{n}}^{m}).$

We use the Broyden–Fletcher–Goldfarb–Shanno algorithm to perform the maximization of this loglikelihood function. In Section 5.3, we compare the results obtained with and without state dependency in the standard deviation estimate.

### 3.5 Upper target as stop-profit

As discussed in Section 2, the main purpose of the upper target $U_{W}$ in the performance measure is to reduce downside risk. However, in multiperiod optimization, a paradox might occur when the realized wealth overshoots the upper target: by default, the portfolio optimizer might tell the fund manager to pick the assets that are most likely to fall. It is easy to see that, when $\smash{W_{t}\geq U_{W}R_{f}^{-(T-t)}}$, one can outperform the upper target for certain by henceforth investing an amount of wealth $\smash{U_{W}R_{f}^{-(T-t)}}$ into the risk-free asset and taking the balance amount $\smash{W_{t}-U_{W}R_{f}^{-(T-t)}}$ out of the problem. To implement such a correction, two approaches are possible.

1. (1)

One can replace $T$ by $\min\{T,\tau\}$ in the value function in (2.1), where $\tau$ is the first (stopping) time such that $\smash{W_{\tau}\geq U_{W}R_{f}^{-(T-\tau)}}$. At time $\tau$ (if it occurs before $T$), the dynamic optimization stops: the amount $\smash{U_{W}R_{f}^{-(T-\tau)}}$ is invested in the risk-free asset, and the balance amount $\smash{W_{\tau}-U_{W}R_{f}^{-(T-\tau)}}$ is taken out.

2. (2)

One can add an extra dynamic control to the problem: dynamic withdrawal/consumption (see, for example, Dang et al 2017).

For simplicity, we use the first approach in this paper. Based on our numerical experiments, we find that imposing this stop-profit rule does not significantly affect the terminal wealth distribution, as usually only a very small portion of wealth realizations overshoot the upper bound. For example, we show in the numerical section that about 1% of the realizations overshoot the upper bound for $[L_{W}=1.0,U_{W}=1.1]$; this value becomes virtually 0% for $[L_{W}=1.0,U_{W}=1.2]$.

## 4 Extensions

This section adapts the two-stage LSMC method to alternative investment objectives. We start by describing how to use the two-stage LSMC method to deal with the CRRA utility approach. We then adapt the formulation of the STRS to the FTRS, which purely maximizes the probability of achieving a prespecified target range without further attempts to rally for profits, and to target range strategies based on a stochastic benchmark, for which the absolute fixed target range is replaced by a relative target range.

### 4.1 CRRA utility

In the classical LSMC approach, a conditional expected utility of the type

 $\mathbb{E}[\mathcal{U}(W_{T})\mid\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w]$

would be approximated by $\beta\psi(z,w)$, which may lead to large numerical errors when the utility function $\mathcal{U}$ is highly nonlinear (see Van Binsbergen and Brandt 2007; Garlappi and Skoulakis 2009; Denault and Simonato 2017; Zhang et al 2019; Andreasson and Shevchenko 2018). The proposed two-stage regression avoids this nonlinearity problem and greatly improves the stability of the LSMC method. In this subsection, we derive the two-stage continuation value estimates for the CRRA utility approach. These estimates involve the following special functions.

• Gamma function:

 $\varGamma(z)=\int_{0}^{\infty}t^{z-1}\exp(-t)\,\mathrm{d}t.$
• Rising factorial:

 $z^{(n)}=\frac{\varGamma(z+n)}{\varGamma(z)}.$
• Confluent hypergeometric function of the first kind:

 ${}_{1}F_{1}(a,b,z)=\sum_{n=0}^{\infty}\frac{a^{(n)}}{b^{(n)}}\frac{z^{n}}{n!}.$
• Confluent hypergeometric function of the second kind:

 $\varPsi(a,b,z)=\frac{\varGamma(1-b)}{\varGamma(a-b+1)}{}_{1}F_{1}(a,b,z)+\frac% {\varGamma(b-1)}{\varGamma(a)}z^{1-b}{}_{1}F_{1}(a-b+1,2-b,z).$

Assume that the conditional mean of the terminal wealth $\smash{\hat{\mu}_{t_{n}}^{j}(z,w)}$ and the standard deviation $\smash{\hat{\sigma}_{t_{n}}^{j}}$ have been estimated according to (3.6) and (3.7). Then, using the general formula for the real moments of a Gaussian distribution (Winkelbauer 2014), the continuation value function in the CRRA utility approach is given by

 $\displaystyle\hat{\mathrm{CV}}_{t_{n}}^{j}(z,w)$ $\displaystyle=\mathbb{E}\bigg{[}\frac{\hat{W}_{t_{N}}^{1-\gamma}}{1-\gamma}% \biggm{|}\bm{Z}_{t_{n}}=z,\ W_{t_{n}}=w,\ \bm{\alpha}_{t_{n}}=\bm{a}_{j}\bigg{]}$ $\displaystyle=\frac{(\hat{\sigma}_{t_{n}}^{j})^{1-\gamma}}{1-\gamma}(-i\sqrt{2% })^{1-\gamma}\varPsi\bigg{(}{-}\frac{1-\gamma}{2},\frac{1}{2},-\frac{1}{2}% \bigg{(}\frac{\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg{)}^{% \!2}\bigg{)}.$ (4.1)

We use this closed-form formula for the numerical comparisons in Section 5.3.

### 4.2 Flat target range strategy

The return distribution produced by the STRS (2.2) is skewed toward the upper return target. Yet, other types of portfolio optimization problems exist (such as life-cycle and insurance-related investments) for which the ability to remain solvent prevails over the appetite for high expected returns. For such problems, one can adjust the skewed target range shape (2.2) to a flat target range shape given by

 $f(w)=\bm{1}\{L_{W}\leq w\leq U_{W}\}.$ (4.2)

Figure 2 illustrates the (4.2) with $[L_{W},U_{W}]=[1.0,1.2]$.

Then, the portfolio optimization problem becomes

 $\displaystyle v_{t}(z,w)$ $\displaystyle=\sup_{\{\bm{\alpha}_{\tau}\in\mathcal{A}\}_{t\leq\tau\leq T}}% \mathbb{E}[\bm{1}\{L_{W}\leq w\leq U_{W}\}\mid\bm{Z}_{t}=z,\ W_{t}=w]$ $\displaystyle=\sup_{\{\bm{\alpha}_{\tau}\in\mathcal{A}\}_{t\leq\tau\leq T}}% \mathbb{P}[L_{W}\leq W_{T}\leq U_{W}\mid\bm{Z}_{t}=z,\ W_{t}=w],$ (4.3)

which is a pure probability maximizing strategy.

The conservative FTRS can be deemed more flexible than the classical VaR minimization approach: when $U_{W}=+\infty$, the FTRS (4.3) and VaR minimization achieve comparable investment outcomes, the difference being a fixed, absolute cutoff level for the former and an implicit, relative cutoff level for the latter. In particular, the FTRS minimizes the probability of being below a particular loss level, while the VaR procedure minimizes a particular loss quantile. When $U_{W}$ is finite, the FTRS provides greater flexibility for investors to devise their risk preferences, as the lower return target $L_{W}$ in such circumstances is an explicit input from the investor, and the option to fix an upper target $U_{W}$ broadens the range of possible risk profiles.

Assuming that the conditional mean of the terminal wealth $\smash{\hat{\mu}_{t_{n}}^{j}(z,w)}$ and the standard deviation $\smash{\hat{\sigma}_{t_{n}}^{j}}$ have been estimated according to (3.6) and (3.7), the continuation value function is simply given by

 $\displaystyle\hat{\mathrm{CV}}_{t_{n}}^{j}(z,w)$ $\displaystyle=\mathbb{P}[\bm{1}\{L_{W}\leq W_{t_{N}}\leq U_{W}\}\mid\bm{Z}_{t_% {n}}=z,\ W_{t_{n}}=w,\ \bm{\alpha}_{t_{n}}=\bm{a}_{j}]$ $\displaystyle=\mathbb{P}_{\varepsilon}[\bm{1}\{L_{W}\leq\hat{\mu}_{t_{n}}^{j}(% z,w)+\hat{\sigma}_{t_{n}}^{j}\varepsilon\leq U_{W}\}]$ $\displaystyle=\varPhi\bigg{(}\frac{U_{W}-\hat{\mu}_{t_{n}}^{j}(z,w)}{\hat{% \sigma}_{t_{n}}^{j}}\bigg{)}-\varPhi\bigg{(}\frac{L_{W}-\hat{\mu}_{t_{n}}^{j}(% z,w)}{\hat{\sigma}_{t_{n}}^{j}}\bigg{)}.$ (4.4)

### 4.3 Target range over a stochastic benchmark

It is also possible to define the return thresholds $L_{W}$ and $U_{W}$ relative to a stochastic benchmark, be it a stock market index, an inflation rate, an exchange rate or an interest rate. We refer to Franks (1992), Browne (1999a), Brogan and Stidham (2005) and Gaivoronski et al (2005) for classical investment strategies that aim to outperform a stochastic benchmark.

Denote by $B$ the stochastic benchmark of interest, and define the relative excess wealth as $W-B$. We can then modify the target range function as

 $f_{B}(w,b):=(w-b)\bm{1}\{L_{W}\leq w-b\leq U_{W}\}$ (4.5)

for STRS, and as

 $f_{B}(w,b):=\bm{1}\{L_{W}\leq w-b\leq U_{W}\}$ (4.6)

for FTRS.

The stochastic benchmark $B$ can be modeled simply as one additional exogenous state variable. Therefore, this new problem can be solved using the same approach developed in Section 3.

## 5 Numerical experiments

In this section, we test the STRS and illustrate how it can achieve the investor’s range objective. Table 1 summarizes the asset classes and the exogenous state variables used in our numerical experiments. We consider a portfolio invested in five assets: risk-free cash, US bonds (AGG), US shares (SPY), international shares (IFA) and emerging market shares (EEM); the other assets listed in Table 1 are used as return predictors.

The annual interest rate on the cash component is set at $2\%$. We assume $0.1\%$ proportional transaction costs, and we refer to Zhang et al (2019) for how to deal with switching costs in an LSMC algorithm with endogenous variables. A first-order vector autoregression model is calibrated to the monthly log returns of the assets listed in Table 1 from September 2003 to March 2016. By bootstrapping the residuals, 10 000 simulation paths are generated for one year with monthly time steps. The two-stage regression method approximates a linear wealth $W_{T}$ but not a concave utility $\mathcal{U}(W_{T})$; as a result, a sample of 10 000 paths can be deemed sufficient to reach numerical stability, as reported in Van Binsbergen and Brandt (2007) and Zhang et al (2019). For the same reason, we use a simple second-order multivariate polynomial as the basis functions for the linear least squares regressions in the algorithm. For simplicity, all the reported distributions are simulated in-sample, which might in theory make the estimation upward biased. In the numerical experiments, we use a mesh of 0.2 increments for the discrete control grid and we do not allow short-selling and borrowing. Apart from in Section 5.3, where a state-dependent standard deviation is tested, the state-independent standard deviation is used for all other numerical experiments. The program is coded in Python 3.4.3, and it takes approximately two hours on a 2.2 GHz Intel Core i7 CPU to complete the computation for $M=10\,000$ paths, 12 time steps, 13 state variables, a second-order polynomial basis and a control mesh of 0.2 for a five-dimensional portfolio.

### 5.1 Wealth distribution

Figure 3 provides some examples of estimated distributions of terminal portfolio value when using the STRS. We recall that the portfolio value $W$ and the bounds $[L_{W},U_{W}]$ are scaled by the initial wealth, so, without loss of generality, we assume $W_{0}=1.00$. The lower target $L_{W}$ is set to the initial wealth level $1.00$, a natural choice representing the preference of investors for capital protection. Four different upper targets $U_{W}$ are tested: $1.05$, $1.10$, $1.20$ and $1.30$.

Several comments can be made about the shape of the terminal wealth distribution produced by the STRS in Figure 3. The most striking observation is that the STRS confines most of the wealth realizations within the predefined target range, and for low upper target levels $U_{W}=1.05$ and $U_{W}=1.10$ the wealth distributions mimic, to some extent, the shape of the skewed target range function (2.2), making downside risk negligible. This suggests the two-stage LSMC algorithm is indeed capable of handling an abrupt discontinuous payoff function properly. There are some wealth realizations lying above the upper bound, which, in spite of the first correction described in Section 3.5, may occur due to the discrete-time nature of monthly rebalancing. (A large upward jump can occur during one single month, after which the risky investment is immediately stopped, as described in Section 3.5.)

As expected, setting the upper target $U_{W}$ to a higher level produces a higher expected terminal wealth with higher standard deviation and greater downside risk (as measured by the probability of losing capital). At the same time, the higher the upper target $U_{W}$, the harder it is for the terminal wealth distribution to be skewed toward the upper target. Regarding the tails beyond the targeted range, the two low upper target levels $U_{W}=1.05$ and $U_{W}=1.10$ produce larger right tails, while the two higher levels $U_{W}=1.20$ and $U_{W}=1.30$ produce larger left tails. This is consistent with the fact that the greater $U_{W}$, the bigger the risk the investor is willing to take to achieve a higher return. This illustrates the capability of the STRS to cater to different risk appetites.

An interesting quantity to monitor is the ratio $\mathcal{R}:=(\mathbb{E}[W_{T}]-L_{W})/(U_{W}-L_{W})$, which measures the location of the expected performance $\mathbb{E}[W_{T}]$ relative to the targeted range: $\mathcal{R}=0\%$ means $\mathbb{E}[W_{T}]=L_{W}$, while at the opposite end $\mathcal{R}=100\%$ means $\mathbb{E}[W_{T}]=U_{W}$. In our experiments from Figure 3, $\mathcal{R}$ is a decreasing function of $U_{W}$, from $\mathcal{R}=72\%$ for $U_{W}=1.05$ down to $\mathcal{R}=38\%$ for $U_{W}=1.30$. This illustrates the natural fact that the higher the desired upper target, the harder it is to achieve it. One visible drawback of the proposed strategy is the relatively long left tail when both the upper and lower targets are set to relatively high levels, eg, $L_{W}\geq 1.00$ and $U_{W}\geq 1.20$.

Figure 4 shows the time evolution of the wealth distribution (0.05th percentile to 99.95th percentile) over the whole investment horizon, for the STRS with $[L_{W}=1.0,U_{W}=1.1]$ (part (a)), $[L_{W}=1.0,U_{W}=1.2]$ (part (b)), $[L_{W}=1.0,U_{W}=\infty]$ (part (c)) and $[L_{W}=0,U_{W}=\infty]$ (part (d)), where the last strategy is equivalent to maximizing the expected terminal wealth without taking risk into account. The results show that the wealth distributions in parts (a) and (b) are well tightened within the prespecified target ranges over the whole investment process, which contrasts with the case $U_{W}=\infty$ in parts (c) and (d). Once again, as upside potential and downside risk are naturally intertwined, we cannot protect against downside risk very well when the upper target is set to a very high level, as shown by the $[L_{W}=1.0,U_{W}=\infty]$ example (part (c)).

### 5.2 Sensitivity analysis and choice of $L_{W}$

The next experiment is a sensitivity analysis of the expected terminal wealth, standard deviation and downside risk with respect to the bounds of the STRS. Figure 5 shows how the expected terminal wealth ($\mathbb{E}[W_{T}]$, parts (a) and (b)), the standard deviation of the terminal wealth ($\text{SD}[W_{T}]$, parts (c) and (d)) and the downside risk ($\mathbb{P}[W_{T}<1]$, parts (e) and (f)) are affected by changes in the upper bound $U_{W}$ (left column) and by changes in the lower bound $L_{W}$ (right column).

The left column of Figure 5 shows how the expectation $\mathbb{E}[W_{T}]$, standard deviation $\text{SD}[W_{T}]$ and downside risk $\mathbb{P}[W_{T}<1]$ increase with $U_{W}$, although a plateau is reached around $U_{W}=1.5$ for $\mathbb{P}[W_{T}<1]$ and around $U_{W}=1.8$ for $\mathbb{E}[W_{T}]$.

In the right column, one can see that the standard deviation $\text{SD}[W_{T}]$ and downside risk $\mathbb{P}[W_{T}<1]$ both increase when $L_{W}$ moves away from the initial wealth $W_{0}=1.0$. When $L_{W}>1.0$, both risk measures increase with $|L_{W}-W_{0}|$ due to the additional risk required at the beginning of the trading period to force the portfolio value to grow from $W_{0}=1.0$ to the lower target $L_{W}>W_{0}=1.0$. When $L_{W}<1.0$, both risk measures also increase with $|W_{0}-L_{W}|$ due to the lack of immediate loss penalization. Nevertheless, the net effect of $L_{W}$ on $\mathbb{E}[W_{T}]$ is mostly negligible. As a result, these observations suggest that $L_{W}=W_{0}=1.0$ is an appropriate choice for the lower bound of the targeted interval, from which the upper bound $U_{W}$ can be set according to the risk preference and the return requirement of the investor.

### 5.3 Model validation

The following experiment aims at validating the two-stage LSMC method via a comparison with the classical LSMC method. We first study a CRRA utility optimization example. It has been noted that a simulation-and-regression approach can generate large numerical errors when the utility function is highly nonlinear (high risk aversion): see, for example, Van Binsbergen and Brandt (2007), Garlappi and Skoulakis (2009) and Denault and Simonato (2017). We apply the two-stage LSMC method and the classical LSMC method to CRRA utility optimization and then compare the resulting initial-value function estimates $\hat{v}_{0}=(1/M)\sum_{m=1}^{M}(\hat{W}_{t_{N}})^{1-\gamma}/(1-\gamma)$ for a one-year time horizon with monthly rebalancing. Following Zhang et al (2019), we choose $M=10\,000$ sample paths to ensure numerical stability of the solution. For the classical LSMC method, we include the utility function itself as part of the regression basis, so that the regression basis can be adjusted to some extent to the risk-aversion parameter. Figure 6 shows that the classical LSMC method becomes unstable when the value of $\gamma$ is high, while the two-stage LSMC method converges quite well. In our experiment, the two-stage LSMC method can readily approximate the CRRA utility optimization approach up to $\gamma=100$.

We then compare our two-stage LSMC with the classical LSMC for solving the STRS. To check the possibility of heteroscedastic residuals, we calibrate a state-dependent standard deviation $\sigma(z,w)$, as described in Section 3.4, and compare it with the original two-stage LSMC method in which the standard deviation only depends on the portfolio decision. In particular, we use a simple linear basis to approximate the logarithmic standard deviation. Figure 2 shows that the two-stage LSMC method substantially improves the estimates $\hat{v}_{0}$ and the return distributions, compared with the classical LSMC approach, while using a state-dependent standard deviation does not significantly improve the results, suggesting that the assumption of homoscedastic residuals is reasonable.

### 5.4 STRS and CRRA

We now compare the STRS with the CRRA utility optimization approach. Our main finding regarding this comparison is that for each risk-aversion level $\gamma$ of the CRRA utility approach, one can find a target range $[L_{W},U_{W}]$ such that the STRS delivers a similar expectation, but with a lower standard deviation and a lower downside risk. As an illustration, Figure 7 shows how the STRS with $[L_{W},U_{W}]=[0.93,1.53]$ outperforms the CRRA utility approach with $\gamma=10$. Despite the better statistical moments of the STRS, the shorter right tail of the STRS compared with the CRRA utility approach can be deemed a shortcoming of our approach, although rescinding some upside potential is the reason for the improved downside risk protection compared with the CRRA utility approach.

To provide a more comprehensive comparison, we now report two risk–return trade-offs: the mean–variance efficient frontier and the trade-off between return and downside risk. Figure 8 displays the efficient frontiers of the STRS (for different combinations of $L_{W}$ and $U_{W}$) and the CRRA utility approach (for different $\gamma$ levels) for a three-month investment horizon. The results show that the STRS and the CRRA utility approach trace out a similar mean–variance efficient frontier, while the STRS delivers a better downside risk–return trade-off. Note that the STRS and the CRRA utility approach produce similar results when the risk-aversion parameter is either very small (risk neutral) or very high, while the STRS is preferable for intermediate risk-aversion levels.

A theoretical proof of the higher efficiency of the STRS compared with classical utility strategies would be desirable to corroborate our numerical findings. However, given, for example, the difficulty in deriving an explicit optimal allocation for a single trading period with a simpler downside risk minimization objective (Klebaner et al 2017), a theoretical proof of the higher efficiency of the STRS over classical utility strategies might be out of reach. We thus leave this question for further research.

### 5.5 Extensions

This subsection discusses the wealth distributions produced by the modified target range strategies described in Section 4. Figure 9 provides examples for the FTRS with $L_{W}=1.0$ and $U_{W}=1.05$, $1.10$, $1.20$ and $+\infty$. The main observation is that, as expected, the probability of the terminal wealth lying outside the predefined range $[L_{W},U_{W}]$ is smaller than for the STRS (cf. Figure 3). This is the main strength of the FTRS: downside risk is kept to a minimum, but the price to pay for this safety is the inability to generate high returns. Finally, the wealth distribution is less sensitive to the choice of $U_{W}$: the distribution is tight even when $U_{W}=\infty$, given the absence of incentive to chase high returns.

In theory, if one wants to maximize the probability that the terminal wealth lies within the targeted range with the lower bound $L_{W}=1.0$ and a large enough upper bound $U_{W}$, the optimal decision should be to allocate all the capital to the risk-free asset. Numerically, though, it is difficult to guarantee a full allocation in the risk-free asset at all times and for all paths. Intuitively, the reason for this is the following: for the portfolios allocated mostly to the risk-free asset, most, if not all, of the terminal wealth realizations will lie within the targeted range, which makes the value function flat and almost invariant among these conservative portfolio allocations.

Figure 10 provides some examples for the RTRS with a passive equal-weight portfolio as benchmark. The probability that the portfolio value underperforms the benchmark portfolio remains small (around 6–8% for the excess return distributions), although it is higher than those provided by absolute targets. The reason for this is that the passive equal-weight benchmark already delivers a high expected return, therefore outperforming it requires taking more risk than was necessary in the previous absolute return target examples.

## 6 Conclusions

This paper introduces the STRS for portfolio optimization problems. The STRS maximizes the expected portfolio value while simultaneously restraining the bulk of the return distribution within a predefined range. This joint goal is achieved with an unconstrained optimization formulation, which achieves, in a simpler manner, similar results to those that can be expected from more complex constrained optimization methods. To illustrate the effectiveness of the STRS, we study a multiperiod portfolio optimization problem and propose a two-stage LSMC method to handle the new objective function. The two-stage regression method can also be adopted for general investment objectives such as the smooth CRRA utility. We show that this regression method substantially improves the numerical stability of the LSMC algorithm compared with direct regression. We show that the STRS achieves a similar mean–variance efficient frontier while delivering a better downside risk–return trade-off compared with the CRRA utility approach. We find that the recommended level for the lower bound of the target range is the initial portfolio value, at which the standard deviation and the downside risk of the terminal portfolio value are marginally minimized. From there, the upper bound of the target range can be set based on risk preferences.

Going further, the unconstrained optimization formulation used by the STRS, built upon an indicator function, has the potential to incorporate additional range constraints on other dynamic risk measures, such as realized volatility or maximum drawdown. This is an area we wish to investigate in future research.

## Declaration of interest

The authors report no conflicts of interest. The authors alone are responsible for the content and writing of the paper.

## Acknowledgements

The authors are grateful to Stephen Brown, Wen Chen, Peter Forsyth, Cornelis Oosterlee and the two anonymous referees for their valuable comments and remarks.

## References

• Agarwal, V., and Naik, N. Y. (2004). Risks and portfolio decisions involving hedge funds. Review of Financial Studies 17(1), 63–98 (https://doi.org/10.1093/rfs/hhg044).
• Alexander, G. J., and Baptista, A. M. (2002). Economic implications of using a mean-VaR model for portfolio selection: a comparison with mean–variance analysis. Journal of Economic Dynamics and Control 26(7-8), 1159–1193 (https://doi.org/10.1016/S0165-1889(01)00041-0).
• Andreasson, J., and Shevchenko, P. (2018). Bias-corrected least-squares Monte Carlo for utility based optimal stochastic control problems. Working Paper SSRN:2985828, Social Science Research Network (https://doi.org/10.2139/ssrn.3200164).
• Balata, A., and Palczewski, J. (2018). Regress-later Monte Carlo for optimal control of Markov processes. Preprint (arXiv:1712.09705).
• Barberis, N. (2012). A model of casino gambling. Management Science 58(1), 35–51 (https://doi.org/10.1287/mnsc.1110.1435).
• Brandt, M., Goyal, A., Santa-Clara, P., and Stroud, J. (2005). A simulation approach to dynamic portfolio choice with an application to learning about return predictability. Review of Financial Studies 18, 831–873 (https://doi.org/10.1093/rfs/hhi019).
• Brogan, A. J., and Stidham, S., Jr. (2005). A note on separation in mean-lower-partial-moment portfolio optimization with fixed and moving targets. IIE Transactions 37(10), 901–906 (https://doi.org/10.1080/07408170591007803).
• Browne, S. (1999a). Beating a moving target: optimal portfolio strategies for outperforming a stochastic benchmark. Finance and Stochastics 3(3), 275–294 (https://doi.org/10.1007/s007800050063).
• Browne, S. (1999b). The risk and rewards of minimizing shortfall probability. Journal of Portfolio Management 25(4), 76–85 (https://doi.org/10.3905/jpm.1999.319754).
• Carriere, J. (1996). Valuation of the early-exercise price for options using simulations and nonparametric regression. Insurance: Mathematics and Economics 19(1), 19–30 (https://doi.org/10.1016/S0167-6687(96)00004-2).
• Cong, F., and Oosterlee, C. W. (2016a). Multi-period mean–variance portfolio optimization based on Monte Carlo simulation. Journal of Economic Dynamics and Control 64, 23–38 (https://doi.org/10.1016/j.jedc.2016.01.001).
• Cong, F., and Oosterlee, C. W. (2016b). On pre-commitment aspects of a time-consistent strategy for a mean–variance investor. Journal of Economic Dynamics and Control 70(1), 178–193 (https://doi.org/10.1016/j.jedc.2016.07.010).
• Cong, F., and Oosterlee, C. W. (2017). Accurate and robust numerical methods for the dynamic portfolio management problem. Computational Economics 49(3), 433–458 (https://doi.org/10.1007/s10614-016-9569-0).
• Dang, D.-M., Forsyth, P., and Vetzal, K. (2017). The 4% strategy revisited: a pre-commitment mean–variance optimal approach to wealth management. Quantitative Finance 17(3), 335–351 (https://doi.org/10.1080/14697688.2016.1205211).
• Davis, M., and Norman, A. (1990). Portfolio selection with transaction costs. Mathematics of Operations Research 15(4), 676–713 (https://doi.org/10.1287/moor.15.4.676).
• Denault, M., and Simonato, J.-G. (2017). Dynamic portfolio choices by simulation-and-regression: revisiting the issue of value function vs portfolio weight recursions. Computers and Operations Reseach 79, 174–189 (https://doi.org/10.1016/j.cor.2016.09.022).
• Franks, E. C. (1992). Targeting excess-of-benchmark returns. Journal of Portfolio Management 18(4), 6–12 (https://doi.org/10.3905/jpm.1992.409419).
• Gaivoronski, A. A., Krylov, S., and van der Wijst, N. (2005). Optimal portfolio selection and dynamic benchmark tracking. European Journal of Operational Research 163(1), 115–131 (https://doi.org/10.1016/j.ejor.2003.12.001).
• Garlappi, L., and Skoulakis, G. (2009). Numerical solutions to dynamic portfolio problems: the case for value function iteration using Taylor approximation. Computational Economics 33, 193–207 (https://doi.org/10.1007/s10614-008-9156-0).
• Harlow, W. V. (1991). Asset allocation in a downside-risk framework. Financial Analysts Journal 47(5), 28–40 (https://doi.org/10.2469/faj.v47.n5.28).
• Hata, H., Nagai, H., and Sheu, S.-J. (2010). Asymptotics of the probability minimizing a “down-side” risk. Annals of Applied Probability 20(1), 52–89 (https://doi.org/10.1214/09-AAP618).
• Jain, S., and Oosterlee, C. W. (2015). The stochastic grid bundling method: efficient pricing of Bermudan options and their Greeks. Applied Mathematics and Computation 269(1), 412–431 (https://doi.org/10.1016/j.amc.2015.07.085).
• Kharroubi, I., Langrené, N., and Pham, H. (2014). A numerical algorithm for fully nonlinear HJB equations: an approach by control randomization. Monte Carlo Methods and Applications 20(2), 145–165 (https://doi.org/10.1515/mcma-2013-0024).
• Klebaner, F., Landsman, Z., Makov, U., and Yao, J. (2017). Optimal portfolios with downside risk. Quantitative Finance 17(3), 315–325 (https://doi.org/10.1080/14697688.2016.1197411).
• Konno, H., Shirakawa, H., and Yamazaki, H. (1993). A mean–absolute deviation–skewness portfolio optimization model. Annals of Operations Research 45(1), 205–220 (https://doi.org/10.1007/BF02282050).
• Lai, T. (1991). Portfolio selection with skewness: a multiple-objective approach. Review of Quantitative Finance and Accounting 1(3), 293–305 (https://doi.org/10.1007/BF02408382).
• Longerstaey, J. (1996). RiskMetrics: technical document. Technical Report, JP Morgan.
• Longstaff, F., and Schwartz, E. (2001). Valuing American options by simulation: a simple least-squares approach. Review of Financial Studies 14(1), 681–692 (https://doi.org/10.1093/rfs/14.1.113).
• Markowitz, H. (1952). Portfolio selection. Journal of Finance 7(1), 77–91 (https://doi.org/10.1111/j.1540-6261.1952.tb01525.x).
• Markowitz, H. (1959). Portfolio Selection: Efficient Diversification of Investment. Wiley.
• Milevsky, M. A., Moore, K. S., and Young, V. R. (2006). Asset allocation and annuity-purchase strategies to minimize the probability of financial ruin. Mathematical Finance 16(4), 647–671 (https://doi.org/10.1111/j.1467-9965.2006.00288.x).
• Morton, D. P., Popova, E., and Ivilina, P. (2006). Efficient fund of hedge funds construction under downside risk measures. Journal of Banking and Finance 30(2), 503–518 (https://doi.org/10.1016/j.jbankfin.2005.04.016).
• Nagai, H. (2012). Downside risk minimization via a large deviation approach. Annals of Applied Probability 22(2), 608–669 (https://doi.org/10.1214/11-AAP781).
• Pham, H. (2003). A large deviations approach to optimal long term investment. Finance and Stochastics 7(2), 169–195 (https://doi.org/10.1007/s007800200082).
• Rockafellar, R., and Uryasev, S. (2000). Optimization of conditional value-at-risk. The Journal of Risk 2(3), 21–42 (https://doi.org/10.21314/JOR.2000.038).
• Tsitsiklis, J., and Van Roy, B. (2001). Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks 12(4), 694–703 (https://doi.org/10.1109/72.935083).
• Tversky, A., and Kahneman, D. (1992). Advances in prospect theory: cumulative representation of uncertainty. Journal of Risk and Uncertainty 5(4), 297–323 (https://doi.org/10.1007/BF00122574).
• Van Binsbergen, J. H., and Brandt, M. (2007). Solving dynamic portfolio choice problems by recursing on optimized portfolio weights or on the value function? Computational Economics 29, 355–367 (https://doi.org/10.1007/s10614-006-9073-z).
• von Neumann, J., and Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press.
• Winkelbauer, A. (2014). Moments and absolute moments of the normal distribution. Preprint (arXiv:1209.4340).
• Zhang, R., Langrené, N., Tian, Y., Zhu, Z., Klebaner, F., and Hamza, K. (2019). Dynamic portfolio optimization with liquidity cost and market impact: a simulation-and-regression approach. Quantitative Finance 19(3), 519–532 (https://doi.org/10.1080/14697688.2018.1524155).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here: http://subscriptions.risk.net/subscribe