Journal of Computational Finance

Risk.net

A review of tree-based approaches to solving forward–backward stochastic differential equations

Long Teng

  • A regression-tree-based approach is proposed to solve numerically high-dimensional backward stochastic differential equations (BSDEs).
  • The approach is reviewed from different perspectives, and error analysis is performed.
  • Several numerical experiments including high-dimensional problems are provided to demonstrate accuracy and performance.
  • For the applicability of BSDEs in financial problems, the Heston stochastic volatility model and a high-dimensional nonlinear pricing problem are solved with the presented approach via BSDEs.

In this work, we study ways of solving (decoupled) forward–backward stochastic differential equations numerically using regression trees. Based on general theta-discretization for time integrands, we show how to efficiently use regression-tree-based methods to solve the resulting conditional expectations. Several numerical experiments, including high-dimensional problems, are provided to demonstrate accuracy and performance. To show the applicability of forward–backward stochastic differential equations to financial problems, we apply our tree-based approach to the Heston stochastic volatility model to high-dimensional nonlinear pricing problems.

1 Introduction

It is well known that many problems (eg, pricing and hedging) in the field of financial mathematics can be represented in terms of (forward–)backward stochastic differential equations ((F)BSDEs). These make problems easier to solve but usually have no analytical solution (see, for example, Karoui et al 1997a). The general form of (decoupled) FBSDEs reads

  dXt=a(t,Xt)dt+b(t,Xt)dWt,X0=x0,-dYt=f(t,Xt,Yt,Zt)dt-ZtdWt,YT=ξ=g(XT),}   (1.1)

where Xt,an, b is an n×d matrix, f(t,Xt,Yt,Zt):[0,T]×n×m×m×dm is the driver function and ξ is the square-integrable terminal condition.

The existence and uniqueness of solutions to such equations, under the Lipschitz conditions on f, a(t,Xt), b(t,Xt) and g, were proved by Pardoux and Peng (1990). Since then, many works have tried to relax this condition: for example, the uniqueness of the solution was extended under more general assumptions for f in Lepeltier and Martin (1997), but only in the one-dimensional case.

In recent years, many numerical methods have been proposed for coupled and decoupled FBSDEs. For the numerical algorithms with (least-squares) Monte Carlo approaches we refer the reader to Bender and Steiner (2012), Bouchard and Touzi (2004), Gobet et al (2005), Lemor et al (2006) and Zhao et al (2006); the multilevel Monte Carlo method based on Picard approximation can be found in E et al (2019). There exists a connection between BSDEs and partial differential equations (PDEs) (see Karoui et al 1997b; Peng 1991); some numerical schemes with the aid of this connection can be found in, for example, Ma et al (1994). For the deep-learning-based numerical method we refer to E et al (2017). The approach based on the Fourier method for BSDEs is developed in Ruijter and Oosterlee (2015). See also Zhao et al (2010) and Teng et al (2020) for the multi-step scheme.

In Crisan and Manolarakis (2010) an algorithm with the cubature method is proposed to solve FBSDEs. The cubature method is an approximation method for the law of the solution of an SDE that generates a tree on which expectations and conditional expectations are evaluated. By introducing a minimal variance sampling method for the support of the cubature measure, Crisan and Manolarakis address the issue that the tree methods suffer from an exponential increase in the size of the support of the approximating measure. A method relying on an approximation of Brownian motion by a simple random walk is proposed in Ma et al (2002) for BSDEs: that is, the conditional expectations are computed using a tree structure, in which the final condition is allowed to be quite general. In this paper, we show how to efficiently use regression-tree-based approaches to find accurate approximations of FBSDEs (1.1). We apply the general theta-discretization method for the time integrands, and we approximate the resulting conditional expectations using the regression-tree-based approach with sample-splitting techniques. More precisely, we deal with the projection of samples via trees to overcome the curse of dimensionality. Several numerical experiments of different types including 100-dimensional problems are performed to demonstrate our findings.

In the next section we give some notation and definitions, and in Section 3 we introduce the generalized theta method. Section 4 is devoted to how to use the regression-tree-based approaches to approximate conditional expectations. In Section 5 several numerical experiments on different types of FBSDEs, including financial applications, are provided to show the accuracy and applicability for high-dimensional nonlinear problems. Section 6 presents our conclusions.

2 Preliminaries

Throughout the paper, we assume that (Ω,,P;{t}0tT) is a complete, filtered probability space. In this space, a standard d-dimensional Brownian motion Wt with a finite terminal time T is defined, which generates the filtration {t}0tT, that is, t=σ{Ws,0st}, for FBSDEs. The usual hypotheses should be satisfied. We denote the set of all t-adapted and square integrable processes in d by L2=L2(0,T;d). A pair of processes (Yt,Zt) is the solution to the FBSDEs in (1.1) if it is t-adapted and square integrable and satisfies (1.1) as

  Yt=ξ+tTf(s,(Xs),Ys,Zs)ds-tTZsdWs,t[0,T],   (2.1)

where f(t,?s):=f(t,(Xs),Ys,Zs):[0,T](×n)×m×m×dm is t-adapted and ξ=g(XT):nm.

Suppose that the terminal value YT is of the form g(XTt,x), where XTt,x denotes the solution of dXt in (1.1) starting from x at time t. Then the solution (Ytt,x,Ztt,x) of FBSDEs in (1.1) can be represented (Karoui et al 1997b; Ma and Zhang 2005; Peng 1991) as

  Ytt,x=u(t,x),Ztt,x=(u(t,x))b(t,x)t[0,T),   (2.2)

which is the solution to the semi-linear parabolic PDE of the form

  ut+inaiiu+12i,jn(bbT)i,ji,j2u+f(t,x,u,(u)b)=0,   (2.3)

with the terminal condition u(T,x)=g(x). In turn, suppose (Y,Z) is the solution to FBSDEs, u(t,x)=Ytt,x is a viscosity solution to the PDEs.

3 Discretization of the forward–backward stochastic differential equation using the theta method

We introduce the time partition for the time interval [0,T]:

  Δt={titi[0,T],i=0,1,,NT,ti<ti+1,t0=0,tNT=T}.   (3.1)

Let Δti=ti+1-ti be the time step, and denote the maximum time step by Δt. For the FBSDEs, we need to additionally discretize the forward SDE in (1.1):

  Xt=x0+0ta(s,Xs)ds+0tb(s,Xs)dWs.   (3.2)

Suppose that the forward SDE in (3.2) can already be discretized by a process XtiΔt such that

  E[maxti|Xti-XtiΔt|2]=?(Δt),   (3.3)

which means strong mean square convergence of order 12. For the backward process (2.1), the well-known generalized θ-discretization reads

  YNTΔt=g(XNTΔt),ZNTΔt=gx(XNTΔt).   (3.4)

For i=NT-1,,0, we have

  -Ei[Yi+1ΔWi+1] =Δti(1-θ1)Ei[f(ti+1,?i+1)ΔWi+1]-Δtiθ2Zi  
      -Δti(1-θ2)Ei[Zi+1]+RθZi,   (3.5)
  Yi =Ei[Yi+1]+Δtiθ3f(ti,?i)  
      +Δti(1-θ3)Ei[f(ti+1,?i+1)]+RθYi,   (3.6)

where RθYi and RθZi denote the discretization errors (see, for example, Li et al 2017; Zhao et al 2012, 2014). Obviously, by choosing the different values for θl[0,1], l=1,2,3, we can obtain different schemes. Note that θ2 cannot be set to zero.

4 Computation of conditional expectations with the tree-based approach

In this section we show how to innovatively use the tree-based approach (Martinez and Martinez 2007) to compute the conditional expectations included in (3.5) and (3.6). More precisely, the conditional expectation E[?i+1Xi]=η(Xi), Xim, is represented by a tree, which is constructed with certain samples of Xi and ?i+1. The tree will be reused to very efficiently generate samples of η() by directly taking a group of arbitrary samples of Xi (generated from Xi-1) as one input for further iterations. That is to say, we deal with the projection of samples via trees using (3.5) and (3.6); the curse of dimensionality can thus be overcome by, for example, considering the group of Xim generated from Xi-1 entirely as one input for the tree.

4.1 Nonparametric regression

Suppose that the model in nonparametric regression reads

  Y=η(X)+ε,   (4.1)

where ε has a zero expectation and a constant variance, and it is independent of X. It holds that

  E[YX=x]=η(x).   (4.2)

The goal in regression is to find the estimator of this function, η^(x). By nonparametric regression, we are not assuming a particular form for η. Instead, η^ is represented in a regression tree. Given a set of samples (x^,y^), =1,,M, for (X,Y), where X denotes a predictor variable and Y presents the corresponding response variable, we construct a regression tree that can then be used to compute E[YX=x] for an arbitrary x, whose value is not necessarily equal to one of the samples x^.

As an example, we specify the procedure for (3.5) (θ1=12,θ2=1,θ3=12). Assume that (XiΔt)tiΔt is Markovian, so (3.5) can thus be rewritten as

  ZiΔt=E[1ΔtiYi+1ΔtΔWi+1 +12f(ti+1,?i+1Δt)ΔWi+1|XiΔt],i=NT-1,,0.   (4.3)

There also exist deterministic functions ziΔt(x) such that

  ZiΔt=ziΔt(XiΔt).   (4.4)

Starting from time T, we construct the regression tree T^z for the conditional expectation in (4.3) using the following samples:

  (x^NT-1,,1ΔtNT-1y^NT,Δw^NT,+12f^NT,Δw^NT,).  

Thereby, the function

  zNT-1Δt(x)=E[1ΔtNT-1YNTΔtΔWNT+12f(tNT,?NTΔt)ΔWNT|XNT-1Δt=x]   (4.5)

is estimated and presented by a regression tree. By applying (4.5) to the samples x^NT-1, we can directly obtain the samples z^NT-1, of the random variable ZNT-1Δt for =1,,M. Recursively, backward in time, these samples z^NT-1, will be used to generate samples z^NT-2, of the random variables ZNT-2Δt at time tNT-2. At the initial time t=0, we fix an initial value x0 for Xt, and no samples are needed. Using the regression trees constructed at time t1 we obtain the solution Z0Δt=z0Δt(x0).

4.2 Binary regression tree

We review the procedure in Breiman et al (1984) and Martinez and Martinez (2007) for constructing the best regression tree based on the given samples. Basically, we need to grow, prune and finally select the tree. We use the following notation.

  • (x^,y^) denote samples, namely observed data.

  • t^ is a node in the tree T^, and t^L and t^R are the left and right child nodes.

  • ? is the set of terminal nodes in the tree T^ with the number |?|.

  • n(t^) represents the number of samples in node t^.

  • y¯(t^) is the average of samples falling into node t^, namely the predicted response.

4.2.1 Growing a tree

We define the predicted response as the average value of the samples that are contained in a node t^, ie,

  y¯(t^)=1n(t^)x^t^y^.   (4.6)

Obviously, the squared error in the node t^ reads

  R(t^)=1n(t^)x^t^(y^-y¯(t^))2.   (4.7)

The mean squared error for the tree T^ is defined as

  R(T^)=t^?R(t^)=1n(t^)t^?x^t^(y^-y¯(t^))2.   (4.8)

Basically, the tree is constructed by partitioning the space for the samples x^ using a sequence of binary splits. For a split s and node t^, the change in the mean squared error can be thus calculated as

  ΔR(s,t^)=R(t^)-R(t^L)-R(t^R).   (4.9)

The regression tree is thus obtained by iteratively splitting nodes with s, which yields the largest ΔR(s,t^). Thereby, the decrease in R(T^) is maximized. Obviously, the optimal stopping criterion is that all responses in a terminal node are the same, but that is not really realistic.

4.2.2 Pruning a tree

When using the optimal stopping criterion, each terminal node contains only one response, so the error R(t^), and thus R(T^), will be zero. However, first of all, this is unrealistic. Second, the samples are thereby overfitted and the regression tree will thus not generalize well to new observed samples. Breiman et al (1984) suggest growing an overly large regression tree T^max and then finding a nested sequence of subtrees by successively pruning the branches of the tree. This procedure is called pruning a tree. We define an error-complexity measure as

  Rα(T^)=R(T^)+α|?|,α0,   (4.10)

where α represents the complexity cost per terminal node. The error complexity should be minimized by looking for trees. Let T^max be the overly large tree, in which each terminal node contains only one response. Thus, we have Rα(T^max)=α|?|, which indicates a high cost of complexity, while the error is small. To minimize the cost we delete the branches with the weakest link t^k* in tree T^k, which is defined as

  gk(t^k*)=mint^{gk(t^)},gk(t^)=R(t^)-R(T^kt^)|?kt^|-1,   (4.11)

where T^kt^ is the branch of T^t^ corresponding to the internal node t^ of subtree T^k. We prune the branch defined by the node t^k*,

  T^k+1=T^k-T^t^k*,   (4.12)

and thus we obtain a finite sequence of subtrees with fewer terminal nodes and decreasing complexity until the root node, which is

  T^max>T^1>T^2>>T^K=root.   (4.13)

On the other hand, we set

  αk+1=gk(t^k*),   (4.14)

and we thus obtain an increasing sequence of values for the complexity parameter α:

  0=α1<<αk<αk+1<<αK.   (4.15)

By observing both sequences (4.13) and (4.15), it is not difficult to see that for k1 the tree T^k is the one that has the minimal cost complexity for αkα<αk+1.

4.2.3 Selecting a tree

We have to make a trade-off between both the error criteria and the complexity, ie, we need to choose the best tree from the sequence of pruned subtrees. To do this, we can use independent test samples and cross-validation. As an example, we illustrate the independent test sample method; for cross-validation we refer the reader to Breiman et al (1984) and Martinez and Martinez (2007). Clearly, we need honest estimates of the true error R*(T^) to select the right size of tree. To obtain these estimates, we should use samples that were not used to construct the tree to estimate the error. Suppose we have a set of samples L=(x^,y^), which can be randomly divided into two subsets L1 and L2. We use the set L1 to grow a large tree and to obtain the sequence of pruned subtrees. Thus, the samples in L2 are used to evaluate the performance of each subtree by calculating the error between the real response and predicted response. We denote by y¯k(x^) the predicted response using samples x^ for the tree T^k. Then, the estimated error is

  R^(T^k)=1n2(x^i,y^i)L2(y^i-y¯k(x^i))2,   (4.16)

where n2 is the number of samples in L2. This estimated error will be calculated for all subtrees. We need to pick the subtree that has the smallest number of nodes but still keeps the accuracy of the tree with the smallest error, eg, T^0 with an error R^min(T^0). To do this, we define the standard error for this estimate as (Breiman et al 1984)

  SE(R^min(T^0)):=1n21n2i=1n2(y^i-y¯(xi^))4-(R^min(T^0))2,   (4.17)

and then we choose the smallest tree T^k* such that

  R^(T^k*)R^min(T^0)+SE(R^min(T^0)).   (4.18)

T^k* is a tree with minimal complexity cost but has equivalent accuracy to the tree with the minimum error.

4.3 Practical applications

Due to the linearity of conditional expectation, we construct the trees for all possible combinations of the conditional expectations. We denote the tree’s regression error by Rtr and the error of the used iterative method by Rimpl, and we reformulate the scheme (3.4)–(3.6) by combining conditional expectations as

  y^NT, =g(x^NT,),z^NT,=gx(x^NT,).   (4.19)

For i=NT-1,,0, =1,,M, we have

  ziΔt(x^i,) =Eix^i,[1ΔtiYi+1ΔWi+1+12f(ti+1,?i+1)ΔWi+1]+RtrZi,   (4.20)
  yiΔt(x^i,) =Eix^i,[Yi+1+Δti2f(ti+1,?i+1)]+Δti2f^i,+RimplYi+RtrYi,   (4.21)

where Eix^i,[?] denotes the calculated conditional expectation E[?X=x^i,] using the constructed regression tree with the samples of ?. For example, using samples of the predictor variable Xi (ie, x^i,) and samples of the response variable

  1ΔtiYi+1ΔWi+1+12f(ti+1,?i+1)ΔWi+1  

(ie, (1/Δti)y^i+1,Δw^i+1,+(1/2)f^i+1,Δw^i+1,), we construct a regression tree. Then,

  Eix^i,[1ΔtiYi+1ΔWi+1+12f(ti+1,?i+1)ΔWi+1]  

means the computed value of

  E[1ΔtiYi+1ΔWi+1+12f(ti+1,?i+1)ΔWi+1|X=x^i,]  

using that constructed tree. This is to say that y^i, and z^i, are computed based on the tree at the ith step, which depends on y^i+1, and z^i+1,. Let

  ?^i+1, :=1Δtiy^i+1,Δw^i+1,+12f^i+1,Δw^i+1,,  
  ?^i+1, :=y^i+1,+12f^i+1,.  

Thus, we define

  RtrZi=1M=1M(z^i,-ziΔt(x^i,))2  

and

  RtrYi=1M=1M(y^i,-yiΔt(x^i,))2  

with

  ziΔt(x^i,)=?^i+1,-εi,zandyiΔt(x^i,)=?^i+1,-εi,y,  

while εiz and εiy have zero mean and constant variance Variz and Variy, respectively. RtrZi and RtrYi can thus be reformulated as

  RtrZi =1M=1M(εi,z-(?^i+1,-z^i,))2  
    2(Variz+R^min(T^Zi)),   (4.22)

and

  RtrYi =1M=1M(εi,y-(?^i+1,-y^i,))2  
    2(Variy+R^min(T^Yi)),   (4.23)

where R^min(T^Zi) and R^min(T^Yi) are defined via (4.16). T^Zi and T^Yi denote the trees with the smallest error at the ith step. Note that (4.17) and (4.18) are used just to find the tree with the smallest error in (4.16), which has the smallest number of nodes.

Theoretically, the regression error can be neglected by choosing sufficiently large M. However, the tree-based approach is computationally not that efficient for a reasonably high value of M. Due to the fact that we are only interested in the solutions of Y0 and Z0 conditional on a known present value of X0, we propose to split a large sample into a few groups of smaller samples at tNT and project samples for tNt1 in different groups. The main reason why we do this is that multiple tree-based computations for a small sample are still more efficient than one computation for a large number of samples. The computation can be done rapidly in the case of a constant predictor, and the quality of approximations for Y0Δt and Z0Δt relies directly on the samples of Y1Δt and Z1Δt. Therefore, for the step t1t0, we combine the samples of Y1Δt and Z1Δt from all groups, which are used as our samples of response variables. We also use W0=0 or X0=x0 as the predictor variable. Our numerical results show that the splitting error of the samples’ projection from tNTt1 could be neglected. As an example, we propose the following fully discrete scheme by choosing θ1=12, θ2=1 and θ3=12.

  1. (1)

    Generate M samples and split them into G different groups, where the sample number in each group is Mg=M/G.

  2. (2)

    For each group, namely =1,,Mg, compute

      y^NT,=g(x^NT,),z^NT,=gx(x^NT,).  

    For i=NT-1,,1, =1,,Mg,

      z^i, =Eix^i,[1ΔtiYi+1ΔtΔWi+1+12f(ti+1,?i+1Δt)ΔWi+1],  
      y^i, =Eix^i,[Yi+1Δt+Δti2f(ti+1,?i+1Δt)]+Δti2f^i,.  
  3. (3)

    Collect all the samples of (z^1,,y^1,) for =1,,M and use all these samples to compute

      Z0Δt =E0x0[1Δt0Y1ΔtΔW1+12f(t1,?1Δt)ΔW1],  
      Y0Δt =E0x0[Y1Δt+Δt02f(t1,?1Δt)]+Δt02f^0,.  

4.4 Error estimates

We perform the error analysis for the scheme θ1=12, θ2=1 and θ3=12; a similar analysis can be done for other choices of θl. Supposing that the errors of the iterative method can be neglected by choosing a sufficiently high number of Picard iterations, we consider the initial discretization and regression errors. The errors due to the time-discretization are given by

  εYi,θ :=Yi-YiΔt,  
  εZi,θ :=Zi-ZiΔt,  
  εfi,θ :=f(ti,?i)-f(ti,?iΔt).  

Analogously to the deterministic functions ziΔt given in (4.4), we define the deterministic functions yiΔt as

  YiΔt=yiΔt(XiΔt).  

These functions are approximated by the regression trees, resulting in the approximations y^iΔt,z^iΔt with

  Y^iΔt=y^iΔt(XiΔt)andZ^iΔt=z^iΔt(XiΔt).  

Thus, we denote the global errors by

  εYi :=Yi-Y^iΔt,   (4.24)
  εZi :=Zi-Z^iΔt,   (4.25)
  εfi :=f(ti,?i)-f(ti,?^iΔt).   (4.26)
Assumption 4.1.

Suppose that X0 is F0-measurable with E[|X0|2]<, and that a and b are L2-measurable in (t,x)[0,T]×Rd, as well as linear-growth-bounded and uniformly Lipschitz continuous, ie, there exist positive constants K and L such that

  |a(t,x)|2 K(1+|x|2),  
  |b(t,x)|2 K(1+|x|2),  
  |a(t,x)-a(t,y)| L|x-y|,  
  |b(t,x)-b(t,y)| L|x-y|,  

with x,yRd.

Let Cbl,k,k,k be the set of continuously differentiable functions f:[0,T]×Rn×Rm×Rm×dRm with uniformly bounded partial derivatives tl1f for 12l1l and xk1yk2zk3f for 1k1+k2+k3k; Cbl,k,k and Cbl,k can be defined analogously, and Cbk is the set of functions g:RdRm with uniformly bounded partial derivatives xk1g for 1k1k.

We give some remarks concerning related results on the one-step scheme.

  • Under Assumption 4.1, if fCb2,4,4,4, gCb4+α for some α(0,1), a and b are bounded and a,bCb2,4, then the absolute values of the local errors RθYi and RθZi in (3.6) and (3.5) can be bounded by C(Δti)3 when θl=12 and l=1,2,3 and by C(Δti)2 when θ1=12, θ2=1 and θ3=12, where C is a constant that can depend on T, x0 and the bounds of a, b, f and g in (1.1) (see, for example, Li et al 2017; Zhao et al 2009, 2012, 2013, 2014).

  • For convenience of notation we can omit the dependency of local and global errors on the state of the FBSDEs and the discretization errors of dXt, ie, we assume Xi=XiΔt.

  • For the implicit schemes we will apply Picard iterations that converge due to the Lipschitz assumptions on the driver and for any initial guess when Δti is small enough. In the following analysis, we consider the equidistant time discretization Δt.

For the Z-component (0iNT-1) we have (see (4.20))

  εZi=Eixi[1ΔtεYi+1ΔWi+1+12εfi+1ΔWi+1]+RθZiΔt+RtrZi,   (4.27)

where εfi+1 can be bounded using Lipschitz continuity of f by

  Eixi[|εfi+1|2] Eixi[|L(|εYi+1|+|εZi+1|)|2]  
    2L2Eixi[|εYi+1|2+|εZi+1|2],   (4.28)

with Lipschitz constant L. And it holds that

  |Eixi[εYi+1ΔWi+1]|2 =|Eixi[(εYi+1-Eixi[εYi+1])ΔWi+1]|2  
    Δt(Eixi[|εYi+1|2]-|Eixi[εYi+1]|2).   (4.29)

Consequently, we calculate

  (Δt)2|εZi|2 2Δt(Eixi[|εYi+1|2]-|Eixi[εYi+1]|2)  
      +3L2(Δt)3Eixi[|εYi+1|2+|εZi+1|2]  
      +6|RθZi|2+6(Δt)2|RtrZi|2,   (4.30)

where Hölder’s inequality is used.

For the Y-component in the implicit scheme we have

  εYi=Eixi[εYi+1+Δt2εfi+1]+Δt2εfi+RθYi+RtrYi.   (4.31)

This error can be bounded by

  |εYi| |Eixi[εYi+1]|+ΔtL2(|εYi|+|εZi|)  
      +ΔtL2Eixi[|εYi+1|+|εZi+1|]+|RθYi|+|RtrYi|.   (4.32)

By using the inequality (a+b)2a2+b2+γΔta2+(b2/γΔt) we calculate

  |εYi|2 (1+γΔt)|Eixi[εYi+1]|2+3(ΔtL)22(|εYi|2+|εZi|2)  
    +3(ΔtL)22(Eixi[|εYi+1|2]+Eixi[|εZi+1|2])+6|RθYi|2+6|RtrYi|2  
    +1γ(3ΔtL22(|εYi|2+|εZi|2)+3ΔtL22(Eixi[|εYi+1|2]+Eixi[|εZi+1|2])+6|RθYi|2Δt+6|RtrYi|2Δt).   (4.33)
Theorem 4.2.

Under Assumption 4.1, if fCb2,4,4,4, gCb4+α for some α(0,1), a and b are bounded, a,bCb2,4, then given

  ENT-1xNT-1[|εZNT|2] ?((Δt)2),  
  ENT-1xNT-1[|εYNT|2] ?((Δt)2),  

it holds that

  E0x0[|εYi|2+Δt|εZi|24]  
    Q(Δt)2+Q~i+1NT(G((VariY)2+NT(VariY)2T+T(VariZ)2NT)),  
                         0iNT-1,   (4.34)

where Q is a constant that depends only on T, x0 and the bounds of f, g and a, b in (1.1); Q~ is a constant depending on T, x0 and L; VariY and VariZ are the bounded constants; and M and G are the number of samples and splitting groups, respectively.

Proof.

By combining both (4.30) and (4.33), it is straightforward to obtain

  Eixi[|εYi|2]+Δt2Eixi[|εZi|2]  
  (1+γΔt)|Eixi[εYi+1]|2+3(ΔtL)22(Eixi[|εYi|2]+Eixi[|εZi|2])  
    +3(ΔtL)22(Eixi[|εYi+1|2]+Eixi[|εZi+1|2])  
    +6Eixi[|RθYi|2]+6Eixi[|RtrYi|2]+(Eixi[|εYi+1|2]-|Eixi[εYi+1]|2)  
    +3(ΔtL)22(Eixi[|εYi+1|2]+Eixi[|εZi+1|2])+3Eixi[|RθZi|2]Δt  
    +3ΔtEixi[|RtrZi|2]  
    +1γ(3ΔtL22(Eixi[|εYi|2]+Eixi[|εZi|2])+3ΔtL22(Eixi[|εYi+1|2]+Eixi[|εZi+1|2])+6Eixi[|RθYi|2]Δt+6Eixi[|RtrYi|2]Δt),  

which implies

  (1-3(ΔtL)22-3ΔtL22γ)Eixi[|εYi|2]+(Δt2-3(ΔtL)22-3ΔtL22γ)Eixi[|εZi|2]  
    (1+γΔt+3(ΔtL)2+3ΔtL22γ)Eixi[|εYi+1|2]  
      +(3(ΔtL)2+3ΔtL22γ)Eixi[|εZi+1|2]+6Eixi[|RθYi|2]+6Eixi[|RtrYi|2]  
      +6Eixi[|RθYi|2]γΔt+6Eixi[|RtrYi|2]γΔt+3Eixi[|RθZi|2]Δt+3ΔtEixi[|RtrZi|2].  

We choose γ such that

  Δt/2-3ΔtL2/2γ3ΔtL2/2γ  

(ie, γ6L2). Thus, the above inequality can be rewritten as

  (1-3(ΔtL)22-3ΔtL22γ)Eixi[|εYi|2]+(3ΔtL22γ-3(ΔtL)22)Eixi[|εZi|2]  
    (1+γΔt+3(ΔtL)2+3ΔtL22γ)Eixi[|εYi+1|2]  
      +(3(ΔtL)2+3ΔtL22γ)Eixi[|εZi+1|2]+6Eixi[|RθYi|2]+6Eixi[|RtrYi|2]  
      +Eixi[|RθYi|2]L2Δt+Eixi[|RtrYi|2]L2Δt+3Eixi[|RθZi|2]Δt+3ΔtEixi[|RtrZi|2],  

which implies

  Eixi[|εYi|2]+Δt4Eixi[|εZi|2]  
    1+CΔt1-CΔt((Eixi[|εYi+1|2]+Δt4Eixi[|εZi+1|2])  
          +6Eixi[|RθYi|2]+6Eixi[|RtrYi|2]  
          +Eixi[|RθYi|2]L2Δt+Eixi[|RtrYi|2]L2Δt  
          +3Eixi[|RθZi|2]Δt+3ΔtEixi[|RtrZi|2]).  

By induction, we then obtain

  Eixi[|εYi|2]+Δt4Eixi[|εZi|2]  
    (1+CΔt1-CΔt)NT-i(ENT-1xNT-1[|εYNT|2]+Δt4ENT-1xNT-1[|εZNT|2])  
      +j=i+1NT(1+CΔt1-CΔt)j-i  
      ×(6Eixi[|RθYj|2]+Eixi[|RθYj|2]L2Δt+3Eixi[|RθZj|2]Δt  
        +6Eixi[|RtrYj|2]+Eixi[|RtrYj|2]L2Δt+3ΔtEixi[|RtrZj|2])  
    exp(2CT)(ENT-1xNT-1[|εYNT|2]+Δt4ENT-1xNT-1[|εZNT|2])  
      +exp(2CT)j=i+1NT(6Eixi[|RθYj|2]+Eixi[|RθYj|2]L2Δt+3Eixi[|RθZj|2]Δt+6Eixi[|RtrYj|2]+Eixi[|RtrYj|2]L2Δt+3ΔtEixi[|RtrZj|2]).  

The regression errors RtrZ and RtrY with M samples are given by (4.22) and (4.23), from which we can deduce, for example,

  |RtrYj|2G|2(Varjy+R^min(T^Yj))|2:=G(VarjY)2.  

Similarly, for |RtrYj|2/Δt and Δt|RtrZj|2 we obtain

  GNT(VarjY)2TandGT(VarjZ)2NT,  

respectively. Finally, with the known conditions and bounds of the local errors mentioned above, the proof is complete. ∎

5 Numerical experiments

In this section we use some numerical examples to show the accuracy of our methods for solving the FBSDEs. NT and M are the total number of discrete time steps and the sample size, respectively. For all the examples, we consider an equidistant time partition and perform 20 Picard iterations. We run the algorithms 10 times independently and take the average value of the absolute error, where two different seeds are used every five simulations. Note that in the following numerical experiments we present the errors of Y0Δt and Z0Δt corresponding to Theorem 4.2. The errors at other time steps and for different fixed sample points are similar to the results in our test. The numerical experiments are performed using an Intel Core i5-8500 3.00 GHz processor with 15 GB RAM.

5.1 Example of linear BSDE

The first BSDE we consider is

  -dYt=(Yt2-Zt2)dt-ZtdWt,YT=sin(WT+T2),   (5.1)

with the analytical solution

  Yt=sin(Wt+t2),Zt=cos(Wt+t2).   (5.2)

The generator f contains the components Yt and Zt. We set T=12, where the analytical solution of (Y0,Z0) is (0,1).

First, in order to see the computational acceleration by using the sample-splitting introduced above, we compare the scheme of using the sample-splitting (samples merging at different steps) with not using the sample-splitting in Figure 1. Letting Y0,kΔt and Z0,kΔt denote the result on the kth run of the algorithm, k=1,,10, the approximations read as

  Y0Δt=110k=110Y0,kΔtandZ0Δt=110k=110Z0,kΔt.  

In our tests we consider the average of the absolute errors: that is,

  110k=110|Y0,kΔt-Y0|and110k=110|Z0,kΔt-Z0|.  

We see that there are no considerable differences for approximating Y0. Also, the approximations of Z0 with the sample-splitting against M converge in a very stable fashion. The sample-splitting allows a very efficient computation, especially merging at the penultimate step. In the remainder of this paper we perform all the schemes, always using the splitting with Mg=1000 unless otherwise specified.

Comparison of using and not using sample-splitting for .... The average run times are given in seconds. (a) .... (b) ....
Figure 1: Comparison of using and not using sample-splitting for NT=10. The average run times are given in seconds. (a) 110k=110|Y0,kΔt-Y0|. (b) 110k=110|Z0,kΔt-Z0|.

Next we study the influence of M on the error. We fix the number of steps to 2 and test all possible values of θl. We find that the explicit schemes for θ3=0, θ2=1 and θ1=12,1 and the implicit schemes for θ3=12,1, θ2=1 and θ1=12,1 show good convergence in terms of M. As an example we report the absolute errors and the empirical standard deviations

  19k=110|Y0,kΔt-Y0Δt|2,19k=110|Z0,kΔt-Z0Δt|2  

for some chosen schemes in Table 1. We observe that the second-order scheme (θ1=12,θ2=12,θ3=12) gives a very small error for quite a large M, although NT=2. This is to say that the linear example is oversimplified in the scheme (θ1=12,θ2=12,θ3=12) with the tree-based regression and can thus mislead in terms of efficiency. Therefore, in order to obtain reliable numerical convergence rates for our proposed tree-based approach, we will consider (θ1=12,θ2=1,θ3=12) for the following analysis and all the numerical examples, which provides the smallest errors for both Y and Z in all the schemes except for the scheme (θ1=12,θ2=12,θ3=12). To take this a step further, we now fix M=200 000 and plot in Figure 2 the absolute error against the number of steps. We see that the scheme converges meaningfully.

Table 1: Comparison of absolute errors for NT=2 with sample size M.
(a) 110k=110|Y0,kΔt-Y0|
  ?
   
  2 000 5 000 10 000 50 000 100 000 200 000 300 000000
(θ1=1,θ2=1,θ3=12) 0.0254 0.0220 0.0251 0.0248 0.0247 0.0244 0.0246
Standard deviation 0.0193 0.0147 0.0099 0.0023 0.0021 0.0016 8.3067e-04
(θ1=12,θ2=1,θ3=12) 0.0177 0.0128 0.0125 0.0123 0.0121 0.0118 0.0120
Standard deviation 0.0196 0.0151 0.0102 0.0023 0.0021 0.0017 9.1699e-04
(θ1=12,θ2=12,θ3=12) 0.0169 0.0129 0.0073 0.0020 0.0019 0.0017 7.2826e-04
Standard deviation 0.0197 0.0162 0.0113 0.0025 0.0022 0.0019 0.0011
(θ1=1,θ2=1,θ3=1) 0.0171 0.0124 0.0070 0.0022 0.0020 0.0019 0.0017
Standard deviation 0.0197 0.0159 0.0110 0.0024 0.0021 0.0019 0.0010
(b) 110k=110|Z0,kΔt-Z0|
  ?
   
  2 000 5 000 10 000 50 000 100 000 200 000 300 000000
(θ1=1,θ2=1,θ3=12) 0.1221 0.1210 0.1190 0.1166 0.1187 0.1197 0.1201
Standard deviation 0.0303 0.0199 0.0147 0.0079 0.0037 0.0032 0.0025
(θ1=12,θ2=1,θ3=12) 0.0579 0.0578 0.0562 0.0537 0.0561 0.0575 0.0578
Standard deviation 0.0319 0.0235 0.0165 0.0081 0.0042 0.0036 0.0027
(θ1=12,θ2=12,θ3=12) 0.0991 0.0453 0.0295 0.0158 0.0144 0.0074 0.0058
Standard deviation 0.1111 0.0550 0.0312 0.0171 0.0173 0.0077 0.0060
(θ1=1,θ2=1,θ3=1) 0.1114 0.1111 0.1095 0.1072 0.1095 0.1107 0.1112
Standard deviation 0.0300 0.0230 0.0151 0.0079 0.0042 0.0035 0.0028
Absolute error versus ... for M=200,000. (a) .... (b) ....
Figure 2: Absolute error versus NT for M=200 000. (a) 110k=110|Y0,kΔt-Y0|. (b) 110k=110|Z0,kΔt-Z0|.
Convergence rate for (a) the Y-component and (b) the Z-component. Line with filled circle shows (...). Line with unfilled circle shows (...).
Figure 3: Convergence rate for (a) the Y-component and (b) the Z-component. Line with filled circle shows (θ1=1, θ2=1, θ3=1). Line with unfilled circle shows (θ1=12, θ2=1, θ3=12).
Table 2: Numerical results from (5.1).
(a) 110k=110|Y0,kΔt-Y0|
  ?  
     
  1 000 2 000 20 000 100 000 300 000 CR
NT 2 4 8 16 32  
(θ1=1,θ2=1,θ3=12) 0.0329 0.0194 0.0080 0.0045 0.0012 1.17
Standard deviation 0.0276 0.0224 0.0051 0.0020 9.8927e-04  
(θ1=12,θ2=1,θ3=12) 0.0262 0.0174 0.0056 0.0027 7.9174e-04 1.28
Standard deviation 0.0279 0.0226 0.0052 0.0020 9.9436e-04  
(b) 110k=110|Z0,kΔt-Z0|
  ?  
     
  1 000 2 000 20 000 100 000 300 000 CR
NT 2 4 8 16 32  
(θ1=1,θ2=1,θ3=12) 0.1142 0.0752 0.0235 0.0157 0.0092 0.95
Standard deviation 0.0273 0.0271 0.0193 0.0078 0.0055  
(θ1=12,θ2=1,θ3=12) 0.0516 0.0448 0.0149 0.0091 0.0066 0.82
Standard deviation 0.0285 0.0265 0.0180 0.0086 0.0056  
(c) Average run time (seconds)
  ?  
     
  1 000 2 000 20 000 100 000 300 000 CR
NT 2 4 8 16 32  
(θ1=1,θ2=1,θ3=12) 0.1 0.5 2.3 23.9 147.0  
(θ1=12,θ2=1,θ3=12) 0.1 0.2 2.3 25.5 144.4  

For the convergence with respect to the time step we refer to Figure 3, where we roughly adjust sample sizes M according to the time partitions. The results shown in Table 2 and Figure 1 are consistent with the statements in Theorem 4.2.

5.2 Example of two-dimensional FBSDE

For the two-dimensional FBSDE we consider the Heston stochastic volatility model (Heston 1993), which reads

  dSt=μStdt+νtStdWtS,dνt=κν(μν-νt)dt+σννtdWtν,dWtSdWtν=ρdt,}   (5.3)

where St is the spot price of the underlying asset and νt is the volatility. It is well known that the Heston model (5.3) can be reformulated as

  d?t=(dνtdSt)=(κν(μν-νt)μSt)dt+(σννt0StρνtSt1-ρ2νt)(dW~tνdW~tS),   (5.4)

where W~tν and W~tS are independent Brownian motions. To find the FBSDE form for the Heston model we consider the following self-financing strategy:

  dYt =atdU(t,νt,St)+btdSt+ctdPt   (5.5)
    =atdU(t,νt,St)+btdSt+(Yt-atU(t,νt,St)-btSt)PtdPt,   (5.6)

where U(t,νt,St) is the value of another option for hedging volatility; dPt=rPtdt is used for the risk-free asset; and at, bt and ct are numbers of the option, the underlying asset and the risk-free asset, respectively. We assume that

  dU(t,νt,St)=η(t,νt,St)dt,   (5.7)

which can be substituted into (5.6) to obtain

  -dYt =(atrU(t,νt,St)-atη(t,νt,St)-(μ-r)1-ρ2νtZt,2-rYt)dt-?t(dW~tνdW~tS),   (5.8)

with

  ?t=(Zt1,Zt2)=(atσννt+btStρνt,btSt1-ρ2νt).   (5.9)

In the Heston (1993) model, the market price of the volatility risk is assumed to be λνt. With the notation used in (5.4), the Heston pricing PDE including λ reads

  Vt+rSVS+(κν(μν-ν)-λν)Vν+12νS22VS2  
              +ρσννS2VSν+12σν2ν2Vν2-rV=0.   (5.10)

The solution of the FBSDE (5.8) is exactly the solution of the Heston PDE (5.10) by choosing rU(t,νt,St)-η(t,νt,St)-λνt. Equations (5.8) and (5.9) can thus be reformulated as

  -dYt =(-atλνt-(μ-r)1-ρ2νtZt2-rYt)dt-?t(dW~tνdW~tS)  
    =(-λνtσνZt1+(ρλνt1-ρ2σν-(μ-r)1-ρ2νt)Zt2-rYt)dt-?t(dW~tνdW~tS),   (5.11)

with ?t defined in (5.9). To the best of our knowledge, this is the first work in the literature to derive the Heston BSDE. Note that the generator in this example could be non-Lipschitz continuous. The European-style option can be replicated by hedging this portfolio. We consider a call option, for example, whose value at time t is identical to the portfolio value Yt, and YT=ξ=max(ST-K,0). Hence, Yt is the Heston option value V(t,νt,St) and ?t presents the hedging strategies, where

  Zt1=Vνσννt+VSStρνtandZt2=VSSt1-ρ2νt.  

The parameter values used for this numerical test are

  K=S0=50,r=0.03,μ=0.05,λ=0,T=0.5,  
  ν0=μν=0.04,κν=1.9,σν=0.1,ρ=-0.7,  

which give the exact solution Y0=3.1825. The forward processes dSt and dνt are simulated using the Euler method, and for the final values at the maturity T we take

  YNT,Δt=max(SNT,Δt-K,0),ZNT,1,Δt={SNT,ΔtρνNT,Δtwhen SNT,Δt>K,0otherwise,ZNT,2,Δt={SNT,Δt1-ρ2νNT,Δtwhen SNT,Δt>K,0otherwise,}   (5.12)

where =1,,M. The corresponding relative errors are reported in Table 3. We obtain a fairly accurate approximation for the Heston option price by solving the two-dimensional FBSDE, although the generator is not Lipschitz continuous. It is well known that a splitting scheme of the alternating direction implicit (ADI) type has been widely analyzed and applied to efficiently find the numerical solution of a two-dimensional parabolic PDE. We thus compare our tree-based approach to the Craig–Sneyd Crank–Nicolson ADI finite difference scheme (Craig and Sneyd 1988) for solving the Heston model in Table 3. We denote by NS and Nν the number of points for the stock price and the volatility grid, respectively. The ADI scheme is performed in the domain [0,2K] for S and [0,0.5] for ν, with a uniform grid NS=Nν=40; the time steps NT are given in Table 3. We can observe that the tree-based approach gives a better (at least, compatible) result.

Table 3: Comparison of our tree-based approach with the Craig–Sneyd Crank–Nicolson ADI finite different scheme. [The exact price is 3.1825. In part (a), CR0.96.]
(a) Tree-based approach (θ1=12,θ2=1,θ3=12)
  ?
   
  5 000 10 000 40 000 100 000 300 000
NT 2 4 8 16 32
110k=110|Y0,kΔt-Y0||Y0| 0.0207 0.0115 0.0043 0.0028 0.0015
Standard deviation 0.0840 0.0307 0.0173 0.0109 0.0051
Average run time (s) 0.2 0.9 7.4 38.8 241.3
(b) Craig–Sneyd Crank–Nicolson ADI scheme
NT 2 4 8 16 32
|YΔt-Y0||Y0| 0.0900 0.0103 0.0068 0.0062 0.0061
Average run time (s) 0.2 0.7 1.2 2.7 6.2
(c) Tree-based approach (θ1=12,θ2=1,θ3=12)
  ?
   
  100 500 1 000 5 000 10 000
NT 8 8 8 8 8
110k=110|Y0,kΔt-Y0||Y0| 0.1181 0.0612 0.0278 0.0162 0.0051
Standard deviation 0.4637 0.2137 0.1171 0.0561 0.0183
Average run time (s) 0.1 0.2 0.2 1.0 1.9

5.3 Examples of nonlinear FBSDE

In this section we test our scheme for nonlinear high-dimensional problems. We find that nonlinear training data may lead to overfitting when directly using the procedure introduced above. Therefore, to avoid overfitting for the nonlinear problems, we propose to control the error while already growing a tree. To do this, we estimate the cross-validation mean squared errors of the trees, which are constructed with a different number of observations in each branch node. Clearly, the best tree (ie, that with the optimal number of observations in each branch node) can be determined by comparing the errors. Theoretically, for the best results, the error control needs to be performed for each time step. However, that would be too computationally expensive. Fortunately, in our test we find that the optimal numbers of observations for each time step are very close to each other. For substantially less computation time, we only need to determine one of these, eg, for the first iteration, and then we need to fix it for all other iterations. We note that the pruning procedure cannot bring considerable improvement when the tree has been grown using the optimal number of observations, and it is thus unnecessary in this case.

Errors versus ... for (...) in the three-dimensional case. (a) .... (b) ....
Figure 4: Errors versus NT for (5.13) in the three-dimensional case. (a) 110k=110|Y0,kΔt-Y0|. (b) 110k=1101Dd=1D|Z0,kd,Δt-Z0d|.
Errors versus ... for (...) in the 100-dimensional case. (a) .... (b) ....
Figure 5: Errors versus NT for (5.13) in the 100-dimensional case. (a) 110k=110|Y0,kΔt-Y0|. (b) 110k=1101Dd=1D|Z0,kd,Δt-Z0d|.
Table 4: Comparison of our approach with the multilevel Monte Carlo method (see E et al 2019, Table 5). [The table shows the tree-based approach (θ1=12, θ2=1, θ3=12) for NT=10. The reference prices are Y0=sin(1) and Z0=(0,,0)T.]
  (a) D=3
  ?
   
  10 000 20 000 50 000 100 000 200 000 300 000
110k=110|Y0,kΔt-Y0| 0.0221 0.0024 0.0018 0.0011 0.0011 0.0009
Standard deviation 0.0027 0.0023 0.0023 0.0013 0.0013 0.0007
110k=1101Dd=1D|Z0,kd,Δt-Z0d| 0.0247 0.0167 0.0100 0.0068 0.0050 0.0050
Standard deviation 0.0117 0.0068 0.0041 0.0017 0.0024 0.0021
Average run time (s) 0.3 0.9 1.8 4.4 8.8 13.2
  (b) D=100
  ?
   
  10 000 20 000 50 000 100 000 200 000 300 000
110k=110|Y0,kΔt-Y0| 0.2729 0.1140 0.0405 0.0180 0.0081 0.0050
Standard deviation 0.0129 0.0033 0.0022 0.0008 0.0004 0.0002
110k=1101Dd=1D|Z0,kd,Δt-Z0d| 0.0281 0.0175 0.0101 0.0072 0.0047 0.0039
Standard deviation 0.0019 0.0012 0.0009 0.0007 0.0005 0.0004
Average run time (s) 35.5 80.4 215.1 450.4 1075.1 2015.7

We first consider a modified version of BSDE with quadratically growing derivatives (derived in Gobet and Turkedjiev (2015)), whose explicit solution is known. The BSDE in the 100-dimensional case is analyzed numerically in E et al (2017) and given by

  -dYt=Z1×D2-ψ(t,Wt)D2-(t+12Δ)ψ(t,Wt)dt-ZtdWt,   (5.13)

with the analytical solution

  Yt=ψ(t,Wt)=sin((T-t+1DWtD2)α),Zt=2αWtTcos((T-t+1DWtD2)α)(T-t+1DWtD2)α-1,}   (5.14)

where α(0,12]. Letting α=0.4, T=1, we test our scheme for D=3 and D=100 in order to compare our approach with methods in Gobet and Turkedjiev (2015) and E et al (2017). We plot the absolute errors of Y and Z against the number of steps in Figures 4 and 5 for the 3-dimensional and 100-dimensional cases, respectively, where the sample sizes M are roughly adjusted according to the time partitions. We obtain very good numerical results and reach an error of order 10-3. To show the errors against the sample size M, we fix NT=10 and report the errors in Table 4, where the convergence in a stable fashion is shown. Further, a better approximation (smaller standard deviations) can always be achieved with larger M. Note that the sample-splittings Mg=20 000 and Mg=100 000 are used for 3-dimensional and 100-dimensional problems, respectively, when M>50 000.

Errors versus ... for pricing with different interest rates for the Y-component in the one-dimensional case.
Figure 6: Errors versus NT for pricing with different interest rates for the Y-component in the one-dimensional case.
Table 5: Comparison of our approach with the multilevel Monte Carlo method (see E et al 2019, Table 5). [The reference price is Y0=7.156 for NT=10 in part (a).]
(a) Tree-based approach (θ1=12,θ2=1,θ3=12)
  ?
   
  2 000 4 000 10 000 20 000 50 000 100 000 200 000
110k=110|Y0,kΔt-Y0||Y0| 0.0239 0.0152 0.0107 0.0073 0.0043 0.0035 0.0013
Standard deviation 0.2089 0.1100 0.0953 0.0686 0.0364 0.0352 0.0130
Average run time (s) 0.1 0.1 0.2 0.3 0.9 1.8 3.7
(b) Multilevel Monte Carlo method
  No. of Picard iterations
   
  1 2 3 4 5 6 7
110k=110|Y0,kΔt-Y0||Y0| 0.8285 0.4417 0.1777 0.1047 0.0170 0.0086 0.0019
Standard deviation 7.7805 4.0799 1.6120 0.8106 0.1512 0.0714 0.0157
Average run time (s) 0.0 0.0 0.0 0.3 3.1 38.7 1915.1
Errors versus ... for pricing with different interest rates for the Y-component in the 100-dimensional case.
Figure 7: Errors versus NT for pricing with different interest rates for the Y-component in the 100-dimensional case.
Table 6: Comparison to the multilevel Monte Carlo method (see E et al 2019, Table 6). [The reference price is Y0=21.2988. The average run time is 2725.1 seconds. NT=10 in part (a).]
  (a) Tree-based approach (θ1=12,θ2=1,θ3=12)
  ?
   
  10 000 20 000 50 000 100 000 200 000 300 000
110k=110|Y0,kΔt-Y0||Y0| 0.0212 0.0111 0.0027 0.0026 0.0025 0.0022
Standard deviation 0.0652 0.0622 0.0283 0.0234 0.0179 0.0135
Average run time (s) 20.1 43.2 112.6 224.9 450.8 677.3
  (b) Multilevel Monte Carlo method
  No. of Picard iterations
   
  1 2 3 4 5 6
110k=110|Y0,kΔt-Y0||Y0| 0.4415 00.4573 0.1798 0.1042 0.0509 0.0474
Standard deviation 8.7977 11.3167 4.4920 2.9533 1.4486 1.3757
Average run time (s) 0.0 0.0 0.0 0.4 4.2 52.9

As another example, we consider a pricing problem of a European option in a financial market with different interest rates for borrowing and lending to hedge the European option. We suppose that D stocks, which are for simplicity assumed to be independent and identically distributed, are driven by

  dSt,d=μSt,ddt+σSt,ddWt,d,d=1,,D,   (5.15)

where σ>0 and μ. The terminal condition and generator for the option pricing with different interest rate are given by

  YT=ξ=max(maxd=1,,D(ST,d)-K1,0)-2max(maxd=1,,D(ST,d)-K2,0)   (5.16)

and

  f(t,x,y,z)=-Rly-μ-Rlσd=1Dzd+(Rb-Rl)max(0,1σd=1Dzd-y),   (5.17)

respectively, where Rb,Rl are different interest rates and K1, K2 are strikes. Obviously, (5.16) and (5.17) are both nonlinear.

We first consider a one-dimensional case, in which we use YT=ξ=max(ST-100,0) instead of (5.16) to agree with the setting in Gobet et al (2005) and E et al (2017). The parameter values are set as T=0.5, μ=0.06, σ=0.02, Rl=0.04 and Rb=0.06. We use Y0=7.156, computed using the finite-difference method, as the reference price. Note that the reference price is confirmed in Gobet et al (2005) as well. As in the previous example, we plot in Figure 6 the relative error against the number of steps, which gives very good numerical results. In Table 5 we compare our results with the results in E et al (2019, Table 5), and we show that the tree-based approach with NT=10 can reach the accuracy level of the multilevel Monte Carlo with seven Picard iterations in significantly less computational time. Finally, we test our scheme for the 100-dimensional nonlinear pricing problem. Figure 7 compares the errors against the number of steps NT. In Table 6 we compare our results with those in E et al (2019, Table 6). The reference price Y0=21.2988 is computed using the multilevel Monte Carlo with seven Picard iterations, whereas K1=120, K2=150 and values of other parameters are the same as those for the one-dimensional case. For a comparison of computational cost in terms of the number of splitting groups, we set Mg=50 000 for M>50 000, ie, there are twice as many splitting groups as in the latter 100-dimensional example. We see that our result with NT=10,M=10 000 is already better than the approximation of multilevel Monte Carlo with six iterations for less computational time. Further, compared with the results in Table 4, the computational expenses are substantially reduced since the samples are split into more groups. Note that the same reference price is used to compare the deep-learning-based numerical methods for high-dimensional BSDEs in E et al (2017, Table 3), which achieved a relative error of 0.0039 in a run time of 566 seconds.

6 Conclusion

In this paper we have studied ways of solving forward–backward stochastic differential equations numerically using regression-tree-based methods. We showed how to use the regression tree to approximate the conditional expectations arising by discretizing the time integrands using the general theta-discretization method. We performed several numerical experiments for different types of (F)BSDEs, including their application to 100-dimensional nonlinear pricing problems. Our numerical results are quite promising and indicate that the tree-based approach is very attractive for solving high-dimensional nonlinear (F)BSDEs.

Declaration of interest

The author reports no conflicts of interest. The author alone is responsible for the content and writing of the paper.

Acknowledgements

The author thanks the referees for their valuable suggestions, which greatly improved the paper.

References

  • Bender, C., and Steiner, J. (2012). Least-squares Monte Carlo for backward SDEs. Numerical Methods in Finance 12, 257–289 (https://doi.org/10.1007/978-3-642-25746-9_8).
  • Bouchard, B., and Touzi, N. (2004). Discrete-time approximation and Monte Carlo simulation of backward stochastic differential equations. Stochastic Processes and Their Applications 111, 175–206 (https://doi.org/10.1016/j.spa.2004.01.001).
  • Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Taylor & Francis.
  • Craig, I. J. D., and Sneyd, A. D. (1988). An alternating-direction implicit scheme for parabolic equations with mixed derivatives. Computers and Mathematics with Applications 16, 341–350 (https://doi.org/10.1016/0898-1221(88)90150-2).
  • Crisan, D., and Manolarakis, K. (2010). Solving backward stochastic differential equations using the cubature method: application to nonlinear pricing. SIAM Journal on Financial Mathematics 3(1), 534–571 (https://doi.org/10.1137/090765766).
  • E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics 5(4), 349–380 (https://doi.org/10.1007/s40304-017-0117-6).
  • E, W., Hutzenthaler, M., Jentzen, A., and Kruse, T. (2019). On multilevel picard numerical approximations for high-dimensional nonlinear parabolic partial differential equations and high-dimensional nonlinear backward stochastic differential equations. Journal of Scientific Computing 79, 1534–1571 (https://doi.org/10.1007/s10915-018-00903-0).
  • Gobet, E., and Turkedjiev, P. (2015). Linear regression MDP scheme for discrete backward stochastic differential equations under general conditions. Mathematics of Computation 85(299), 1359–1391 (https://doi.org/10.1090/mcom/3013).
  • Gobet, E., Lemor, J. P., and Warin, X. (2005). A regression-based Monte Carlo method to solve backward stochastic differential equations. Annals of Applied Probability 15(3), 2172–2202 (https://doi.org/10.1214/105051605000000412).
  • Heston, S. L. (1993). A closed-form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies 6(2), 327–343 (https://doi.org/10.1093/rfs/6.2.327).
  • Karoui, N. E., Kapoudjan, C., Pardoux, E., Peng, S., and Quenez, M. C. (1997a). Reflected solutions of backward stochastic differential equations and related obstacle problems for PDEs. Annals of Probability 25, 702–737.
  • Karoui, N. E., Peng, S., and Quenez, M. C. (1997b). Backward stochastic differential equations in finance. Mathematical Finance 7(1), 1–71 (https://doi.org/10.1111/1467-9965.00022).
  • Lemor, J., Gobet, E., and Warin, X. (2006). Rate of convergence of an empirical regression method for solving generalized backward stochastic differential equations. Bernoulli 12, 889–916 (https://doi.org/10.3150/bj/1161614951).
  • Lepeltier, J. P., and Martin, J. S. (1997). Backward stochastic differential equations with continuous generator. Statistics and Probability Letters 32, 425–430 (https://doi.org/10.1016/S0167-7152(96)00103-4).
  • Li, Y., Yang, J., and Zhao, W. (2017). Convergence error estimates of the Crank–Nicolson scheme for solving decoupled FBSDEs. Science China Mathematics 60, 923–948 (https://doi.org/10.1007/s11425-016-0178-8).
  • Ma, J., and Zhang, J. (2005). Representations and regularities for solutions to BSDEs with reflections. Stochastic Processes and Their Applications 115, 539–569 (https://doi.org/10.1016/j.spa.2004.05.010).
  • Ma, J., Protter, P., and Yong, J. (1994). Solving forward–backward stochastic differential equations explicity: a four step scheme. Probability Theory and Related Fields 98(3), 339–359 (https://doi.org/10.1007/BF01192258).
  • Ma, J., Protter, P., Martín, J. S., and Torres, S. (2002). Numerical method for backward stochastic differential equations. Annals of Applied Probability 12, 302–316 (https://doi.org/10.1214/aoap/1015961165).
  • Martinez, W. L., and Martinez, A. R. (2007). Computational Statistics Handbook with MATLAB, 2nd edn. CRC Press, Boca Raton, FL (https://doi.org/10.1201/b13622).
  • Pardoux, E., and Peng, S. (1990). Adapted solution of a backward stochastic differential equations. System and Control Letters 14, 55–61 (https://doi.org/10.1016/0167-6911(90)90082-6).
  • Peng, S. (1991). Probabilistic interpretation for systems of quasilinear parabolic partial differential equations. Stochastics and Stochastic Reports 37(1–2), 61–74 (https://doi.org/10.1080/17442509108833727).
  • Ruijter, M. J., and Oosterlee, C. W. (2015). A Fourier cosine method for an efficient computation of solutions to BSDEs. SIAM Journal on Scientific Computing 37(2), A859–A889 (https://doi.org/10.1137/130913183).
  • Teng, L., Lapitckii, A., and Günther, M. (2020). A multi-step scheme based on cubic spline for solving backward stochastic differential equations. Applied Numerical Mathematics 150, 117–138 (https://doi.org/10.1016/j.apnum.2019.09.016).
  • Zhao, W., Chen, L., and Peng, S. (2006). A new kind of accurate numerical method for backward stochastic differential equations. SIAM Journal on Scientific Computing 28(4), 1563–1581 (https://doi.org/10.1137/05063341X).
  • Zhao, W., Wang, J., and Peng, S. (2009). Error estimates of the θ-scheme for backward stochastic differential equations. Discrete and Continuous Dynamical Systems B 12, 905–924 (https://doi.org/10.3934/dcdsb.2009.12.905).
  • Zhao, W., Zhang, G., and Ju, L. (2010). A stable multistep scheme for solving backward stochastic differential equations. SIAM Journal on Numerical Analysis 48, 1369–1394 (https://doi.org/10.1137/09076979X).
  • Zhao, W., Li, Y., and Zhang, G. (2012). A generalized θ-scheme for solving backward stochastic differential equations. Discrete and Continuous Dynamical Systems B 17(5), 1585–1603 (https://doi.org/10.3934/dcdsb.2012.17.1585).
  • Zhao, W., Li, Y., and Ju, L. (2013). Error estimates of the Crank–Nicolson scheme for solving backward stochastic differential equations. International Journal of Numerical Analysis and Modeling 10(4), 876–898.
  • Zhao, W., Zhang, W., and Ju, L. (2014). A numerical method and its error estimates for the decoupled forward–backward stochastic differential equations. Communications in Computational Physics 15, 618–646 (https://doi.org/10.4208/cicp.280113.190813a).

Only users who have a paid subscription or are part of a corporate subscription are able to print or copy content.

To access these options, along with all other subscription benefits, please contact [email protected] or view our subscription options here: http://subscriptions.risk.net/subscribe

You are currently unable to copy this content. Please contact [email protected] to find out more.

You need to sign in to use this feature. If you don’t have a Risk.net account, please register for a trial.

Sign in
You are currently on corporate access.

To use this feature you will need an individual account. If you have one already please sign in.

Sign in.

Alternatively you can request an individual account here: