The Markov Property

Stochastic Calculus
Author

Quasar

Published

July 12, 2024

The Markov Property for Diffusions

Let’s start by exhibiting the Markov property of Brownian motion. To see this, consider \((\mathcal{F}_t,t\geq 0)\), the natural filtration of the Brownian motion \((B_t,t\geq 0)\). Consider \(g(B_t)\) for some time \(t\) and bounded function \(g\). (For example, \(g\) could be an indicator function.) Consider also a random variable \(W\) that is \(\mathcal{F}_s\) measurable for \(s < t\). (For example, \(W\) could be \(B_s\) or \(1_{B_s > 0}\).) Let’s compute \(\mathbb{E}[g(B_t)W]\).

\[\begin{align*} \mathbb{E}[g(B_t)W] &= \mathbb{E}[\mathbb{E}[Wg(B_t - B_s + B_s)|\mathcal{F}_s]] \end{align*}\]

The random variable \((B_t - B_s)\) follows a \(\mathcal{N}(0,t-s)\) distribution. By LOTUS,

\[\begin{align*} \mathbb{E}[g(B_t)W] &= \int_{\mathbb{R}} \mathbb{E}[W g(y + B_s)|\mathcal{F_s}]f_{(B_t - B_s)|B_s}(y) dy\\ &= \{\text{ Using the fact that }B_t - B_s \perp B_s\}\\ &= \int_{\mathbb{R}} \mathbb{E}[W g(y + B_s)]f_{(B_t - B_s)}(y)dy\\ &= \int_{\mathbb{R}} \mathbb{E}[W g(y + B_s)]\frac{e^{-\frac{y^2}{2(t-s)}}}{\sqrt{2\pi(t-s)}}dy \end{align*}\]

By Fubini’s theorem, the integral and the expectation operator can be interchanged, and since \(W\) is \(\mathcal{F}_s\) measurable, it follows from the definition of conditional expectations that:

\[ \begin{align*} \mathbb{E}[g(B_t)|\mathcal{F}_s] &= \int_{\mathbb{R}} g(y + B_s) \frac{e^{-\frac{y^2}{2(t-s)}}}{\sqrt{2\pi(t-s)}}dy \end{align*} \tag{1}\]

We make two important observations. First, the right hand side is a function of \(s,t\) and \(B_s\) only (and not of the Brownian motion before time s). In particular, we have:

\[ \mathbb{E}[g(B_t)|\mathcal{F}_s] = \mathbb{E}[g(B_t)|B_s] \]

This holds for any bounded function \(g\). In particular, it holds for all indicator functions. This implies that the conditional distribution of \(B_t\) given \(\mathcal{F}_s\) depends solely on \(B_s\), and not on other values before time \(s\). Second, the right-hand side is time-homogenous in the sense that it depends on the time difference \(t-s\).

We have just shown that Brownian motion is a time-homogenous Markov process.

Definition 1 (Markov process.) Consider a stochastic process \((X_t,t\geq 0)\) and its natural filtration \((\mathcal{F}_t,t\geq 0)\). It is said to be a Markov process if and only if for any (bounded) function \(g: \mathbb{R} \to \mathbb{R}\), we have:

\[ \mathbb{E}[g(X_t) | \mathcal{F}_s] = \mathbb{E}[g(X_t) | X_s], \quad \forall t \geq 0, \forall s \leq t \tag{2}\]

This implies that \(\mathbb{E}[g(X_t)|\mathcal{F}_s]\) is an explicit function of \(s\), \(t\) and \(X_s\). It is said to be time-homogenous, if it is a function of \(t-s\) and \(X_s\). Since the above holds for all bounded \(g\), the conditional distribution of \(X_t\) given \(\mathcal{F}_s\) is the same as the conditional distribution of \(X_t\) given \(X_s\).

One way to compute the conditional distribution of \(X_t\) given \(\mathcal{F}_s\) is to compute the conditional MGF given \(\mathcal{F}_s\), that is:

\[ \mathbb{E}[e^{a X_t}|\mathcal{F}_s], \quad a \geq 0 \tag{3}\]

The process would be Markov, if the conditional MGF is an explicit function of \(s\), \(t\) and \(X_s\).

Example 1 (Brownian Motion is Markov) Let \((B_t,t\geq 0)\) be a standard brownian motion. Our claim is that the brownian motion is a markov process.

Proof.

We have:

\[\begin{align*} \mathbb{E}[e^{a B_t}|\mathcal{F}_s] &= \mathbb{E}[e^{a (B_t - B_s + B_s)}|\mathcal{F}_s]\\ & \{ \text{ since }B_s \text{ is }\mathcal{F}_s-\text{ measurable }\}\\ &= e^{a B_s} \mathbb{E}[e^{a (B_t - B_s)}|\mathcal{F}_s]\\ & \{ \text{ since }B_t - B_s \perp \mathcal{F}_s \}\\ &= e^{a B_s} \mathbb{E}[e^{a (B_t - B_s)}]\\ &= e^{a B_s} e^{\frac{1}{2}a^2(t-s)} \end{align*}\]

This closes the proof. \(\blacksquare\)

An equivalent (but more symmetric) way to express the Markov property is to say that the future of the process is independent of the past, when conditioned on the present. Concretely, this means that for any \(r < s< t\), we have that \(X_t\) is independent of \(X_r\), when we condition on \(X_s\).

The conditional distribution of \(X_t\) given \(X_s\) is well described using transition probabilities. We will more interested in a case well these probabilities admit a density \(f_{X_t|X_s=x}(y)\). More precisely, for such a Markov process, we have:

\[ \begin{align*} \mathbb{E}[g(X_t)|X_s = x] &= \int_{\mathbb{R}} g(y) f_{X_t|X_s=x}(y) dy\\ &=\int_{\mathbb{R}} g(y) p(y,t|x,s) dy \end{align*} \]

Here, we explicitly write the left-hand side as a function of space, that is, the position \(X_s\), by fixing \(X_s = x\). In words, the transition probability density \(p(y,t|x,s)\) represents the probability density that starting from \(X_s = x\) at time \(s\), the process ends up at \(X_t = y\) at time \(t > s\). If the process is time-homogenous, this only depends on the time difference \((t-s)\) and we write \(p(y,t|x,s)\). From Equation 1, we can write: \[ \mathbb{E}[g(B_t)|B_s = x] = \int_{\mathbb{R}} g(u + x) \frac{e^{-\frac{u^2}{2(t-s)}}}{\sqrt{2\pi(t-s)}} du \]

In the above expression, the random variable \(B_t - B_s\) takes some value \(u \in \mathbb{R}\) and \(B_s = x\) is fixed. Then, \(B_t\) takes the value \(u + x\). Let \(y = u + x\). Then, \(u = y - x\). Consequently, we may write:

\[ \mathbb{E}[g(B_t)|B_s = x] = \int_{\mathbb{R}} g(y) \frac{e^{-\frac{(y-x)^2}{2(t-s)}}}{\sqrt{2\pi(t-s)}} dy \]

So, the transition density function for standard Brownian motion is:

\[ p(y,t|x,0)= \frac{e^{-\frac{(y-x)^2}{2s}}}{\sqrt{2\pi s}}, \quad s>0, x,y\in\mathbb{R} \tag{4}\]

This function is sometimes called the heat kernel, as it relates to the heat equation.

The Markov property is very convenient to compute quantities, as we shall see throughout the chapter. As a first example, we remark that it is easy to express joint probabilities of a markov process \((X_t,t\geq 0)\) at different times. Consider the functions \(f = \mathbf{1}_A\) and \(g = \mathbf{1}_B\) from \(\mathbb{R} \to \mathbb{R}\), where \(A\) and \(B\) are two intervals in \(\mathbb{R}\). Let’s compute \(\mathbb{P}(X_{t_1} \in A, X_{t_2} \in B) = \mathbb{E}[\mathbf{1}_{A} \mathbf{1}_{B}] = \mathbb{E}[f(X_{t_1}) g(X_{t_2})]\) for \(t_1 < t_2\). By the properties of conditional expectation and the Markov property, we have:

\[\begin{align*} \mathbb{P}(X_{t_1} \in A, X_{t_2} \in B) &= \mathbb{E}[f(X_{t_1})g(X_{t_2})]\\ &= \mathbb{E}[f(X_{t_1})\mathbb{E}[g(X_{t_2})|\mathcal{F}_{t_1}]]\\ &= \mathbb{E}[f(X_{t_1})\mathbb{E}[g(X_{t_2})|X_{t_1}]] \end{align*}\]

Assuming that the process is time-homogenous and admits a transition density \(p(y,t|x,0)\) as for Brownian motion, this becomes:

\[\begin{align*} \mathbb{P}(X_{t_1} \in A, X_{t_2} \in B) &= \int_{\mathbb{R}} f(x_1) \left(\int_{\mathbb{R}} g(x_2) p(x_2,t_2|x_1,t_1) dx_2 \right) p(x_1,t_1|x_0,0) dx_1\\ &= \int_{A} \left(\int_{B} p(x_2,t_2|x_1,t_1) dx_2 \right) p(x_1,t_1|x_0,0) dx_1 \end{align*}\]

This easily generalizes to any finite-dimensional distribution of \((X_t, t\geq 0)\).

Example 2 (Markov versus Martingale.) Martingales are not markov processes in general and markov processes are not martingales in general. There are processes such as brownian motion that enjoy both. An example of a markov process that is not a martingale is a Brownian motion with a drift \((X_t, t \geq 0)\), where \(X_t = \sigma B_t + \mu t\). Conversely, take \(Y_t = \int_0^t X_s dB_s\), where \(X_s = \int_0^s B_u dB_u\). The integrand \(X_s\) depends on whole Brownian motion path upto time \(s\) and not just on \(B_s\).

Note 1: Functions of Markov Processes

It might be tempting to think that if \((X_t,t\geq 0)\) is a Markov process, then the process defined by \(Y_t = f(X_t)\) for some reasonable function \(f\) is also Markov. Indeed, one could hope to write for an arbitrary bounded function \(g\):

\[ \begin{align*} \mathbb{E}[g(Y_t)|\mathcal{F}_s] = \mathbb{E}[g(f(X_t))|\mathcal{F}_s] = \mathbb{E}[g(f(X_t))|\mathcal{X}_s] \end{align*} \tag{5}\]

by using the Markov property of \((X_t,t\geq 0)\). The flaw in this reasoning is that the Markov property should hold for the natural fitration \((\mathcal{F}_t^Y,t\geq 0)\) of the process \((Y_t,t\geq 0)\) and not the one of \((X_t,t\geq 0)\), \((\mathcal{F}_t^X,t\geq 0)\). It might be that the filtration \((\mathcal{F}_t^Y,t\geq 0)\) has less information that \((\mathcal{F}_t^X,t\geq 0)\), especially, if the function \(f\) is not one-to-one. For example, if \(f(x)=x^2\), then \(\mathcal{F}_t^Y\) has less information than \(\mathcal{F}_t^X\) as we cannot recover the sign of \(X_t\) knowing \(Y_t\). In other words, the second equality may not hold. In some cases, a function of a Brownian motion might be Markov, even when \(f\) is not one-to-one.

It turns out that diffusions such as the Ornstein-Uhlenbeck process and the Brownian bridge are Markov processes.

Theorem 1 (Diffusions are Markov processes.) Let \((B_t,t\geq 0)\) be a standard Brownian motion. Let \(\mu : \mathbb{R} \to \mathbb{R}\) and \(\sigma: \mathbb{R} \to \mathbb{R}\) be differentiable functions with bounded derivatives on \([0,T]\). Then, the diffusion with the SDE

\[ dX_t = \mu(X_t) dt + \sigma(X_t)dB_t, \quad X_0 = x_0 \]

defines a time-homogenous markov process on \([0,T]\).

An analogous statement holds for time-inhomogenous diffusions. The proof is generalization of the Markov property of Brownian motion. We take advantage of the independence of Brownian increments.

Proof.

By the existence and uniqueness theorem, this stochastic initial value problem(SIVP) defines a unique continous adapted process \((X_t,t\leq T)\). Let \((\mathcal{F}_t^X,t\geq 0)\) be the natural filtration of \((X_t,t\leq T)\). For a fixed \(t > 0\), consider the process \(W_s = B_{t+s} - B_t, s \geq 0\). Let \((\mathcal{F}_t,t \geq 0)\) be the natural filtration of \((B_t,t \geq 0)\). It turns out that the process \((W_s,s \geq 0)\) is a standard brownian motion independent of \(\mathcal{F}_t\) (Exercise 1). For \(s \geq 0\), we consider the SDE:

\[ dY_s = \mu (Y_s) ds + \sigma(Y_s) dW_s, \quad Y_0 = X_t \]

Again by the existence and uniqueness theorem, there exists a unique solution to the SIVP that is adapted to the natural filtration of \(W\). Note that, the shifted process \((X_{t+s},s\geq 0)\) is the solution to this SIVP since:

\[\begin{align*} X_{t+s} &= X_{t} + \int_{t}^{t+s}\mu(X_u) du + \int_{t}^{t+s}\sigma(X_u) dB_u \end{align*}\]

Perform a change of variable \(v = u - t\). Then, \(dv = du\), \(dB_u = B(u_2) - B(u_1)= B(t + v_2) - B(t + v_1) = W(v_2) - W(v_1) = dW_v\). So,

\[\begin{align*} X_{t+s} &= X_{t} + \int_{0}^{s}\mu(X_{t+v}) dv + \int_{0}^{s}\sigma(X_{t+v}) dW_v \end{align*}\]

Let \(Y_v= X_{t+v}\), \(Y_0 = X_t\). Then,

\[\begin{align*} Y_s &= Y_0 + \int_{0}^{s}\mu(Y_v) dv + \int_{0}^{s}\sigma(Y_v) dW_v \end{align*}\]

Thus, we conclude that for any interval \(A\):

\[ \mathbb{P}(X_{t+s} \in A|\mathcal{F}_t^X) = \mathbb{P}(Y_s \in A | \mathcal{F}_t^X) \]

But, since \((Y_s,s \geq 0)\) depends on \(\mathcal{F}_t^X\) only through \(X_t\) (because \((W_s,s \geq 0)\) is independent of \(\mathcal{F}_t\)), we conclude that \(\mathbb{P}(X_{t+s} \in A|\mathcal{F}_t^X) = \mathbb{P}(X_{t+s} \in A|X_t)\), so \((X_t,t \geq 0)\) is a time-homogenous markov process. \(\blacksquare\)

The Strong Markov Property

The Doob’s Optional Stopping theorem extended some properties of martingales to stopping times. The Markov property can also be extended to stopping times for certain processes. These processes are called strong Markov processes.

We know, that the sigma-algebra \(\mathcal{F}_t\) represents the set of all observable events upto time \(t\). What is the sigma-algebra of observable events at a random stopping time \(\tau\)?

Definition 2 (\(\sigma\)-algebra of \(\tau\)-past) Let \((\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\geq 0},\mathbb{P})\) be a filtered probability space. The sigma-algebra at the stopping time \(\tau\) is then:

\[ \mathcal{F}_{\tau} = \{A \in \mathcal{F}_\infty : A \cap \{\tau \leq t\} \in \mathcal{F}_t, \forall t \geq 0 \} \tag{6}\]

In words, an event \(A\) is in \(\mathcal{F}_\tau\), if we can determine if \(A\) and \(\{\tau \leq t\}\) both occurred or not based on the information \(\mathcal{F}_t\) known at any arbitrary time \(t\). You should be able to tell the value of the random variable \(\mathbf{1}_A \cdot \mathbf{1}_{\{\tau \leq t\}}\) given \(\mathcal{F}_t\) for any arbitrary time \(t \geq 0\).

For example, if \(\tau < \infty\), the event \(\{B_\tau > 0\}\) is in \(\mathcal{F}_\tau\). However, the event \(\{B_1 > 0\}\) is not in \(\mathcal{F}_\tau\) in general, since \(A \cap \{\tau \leq t\}\) is not in \(\mathcal{F}_t\) for \(t < 1\). Roughly speaking, a random variable that is \(\mathcal{F}_\tau\)-measurable should be thought of as an explicit function of \(X_\tau\). With this new object, we are ready to define the strong markov property.

Definition 3 (Strong Markov Property) Let \((X_t,t\geq 0)\) be a stochastic process and let \((\mathcal{F}_t,t\geq 0)\) be its natural filtration. The process \((X_t,t\geq 0)\) is said to be strong markov if for any stopping time \(\tau\) for the filtration of the process and any bounded function \(g\):

\[ \mathbb{E}[g(X_{t+\tau})|\mathcal{F}_\tau] = \mathbb{E}[g(X_{t+\tau})|X_\tau] \]

This means that \(X_{t+\tau}\) depends on \(\mathcal{F}_\tau\) solely through \(X_\tau\) (whenever \(\tau < \infty\)). It turns out that Brownian motion is a strong markov process. In fact a stronger statement holds which generalizes Exercise 1.

Theorem 2 Let \(\tau\) be a stopping time for the filtration of the Brownian motion \((B_t,t\geq 0)\) such that \(\tau < \infty\). Then, the process:

\[ (B_{t+\tau} - B_{\tau},t\geq 0) \]

is a standard brownian motion independent of \(\mathcal{F}_\tau\).

Example 3 (Brownian motion is strong Markov) To see this, let’s compute the conditional MGF as in Equation 3. We have:

\[ \begin{align*} \mathbb{E}[e^{aB_{t+\tau}}|\mathcal{F}_\tau] &= \mathbb{E}[e^{a(B_{t+\tau} - B_\tau + B_\tau)}|\mathcal{F}_\tau]\\ &= e^{aB_\tau} \mathbb{E}[e^{a(B_{t+\tau} - B_\tau)}|\mathcal{F}_\tau]\\ & \{ B_\tau \text{ is }\mathcal{F}_\tau-\text{measurable }\}\\ &= e^{aB_\tau}\mathbb{E}[e^{a(B_{t+\tau} - B_\tau)}]\\ & \{ (B_{t+\tau} - B_\tau) \perp \mathcal{F}_\tau\}\\ &= e^{aB_\tau}e^{\frac{1}{2}a^2 t}\\ \end{align*} \]

Thus, the conditional MGF is an explicit function of \(B_\tau\) and \(t\). This proves the proposition. \(\blacksquare\)

Proof of Theorem 2.

We first consider for fixed \(n\) the discrete valued stopping time:

\[ \tau_n = \frac{k + 1}{2^n}, \quad \text{ if } \frac{k}{2^n} \leq \tau < \frac{k+1}{2^n}, k\in \mathbb{N} \]

In other words, if \(\tau\) occurs in the interval \([\frac{k}{2^n},\frac{k+1}{2^n})\), we stop at the next dyadic \(\frac{k+1}{2^n}\). By construction \(\tau_n\) depends only on the process in the past. Consider the process \(W_t = B_{t + \tau_n} - B_{\tau_n}, t \geq 0\). We show it is a standard brownian motion independent of \(\tau_n\). This is feasible as we can decompose over the discrete values taken by \(\tau_n\). More, precisely, take \(E \in \mathcal{F}_{\tau_n}\), and some generic event \(\{W_t \in A\}\) for the process \(W\). Then, by decomposing over the values of \(\tau_n\), we have:

\[ \begin{align*} \mathbb{P}(\{W_t \in A\} \cap E) &= \sum_{k=0}^\infty \mathbb{P}\left(\{W_t \in A\} \cap E \cap \{\tau_n = \frac{k}{2^n}\}\right)\\ &= \sum_{k=0}^\infty \mathbb{P}\left(\{(B_{t+k/2^n} - B_{k/2^n}) \in A\} \cap E \cap \{\tau_n = \frac{k}{2^n}\}\right)\\ &= \sum_{k=0}^\infty \mathbb{P}\left(\{(B_{t+k/2^n} - B_{k/2^n}) \in A\}\right) \times \mathbb{P}\left( E \cap \{\tau_n = \frac{k}{2^n}\}\right) \end{align*} \]

since \((B_{t+k/2^n} - B_{k/2^n})\) is independent of \(\mathcal{F}_{k/2^n}\) by Exercise 1 and since \(E \cap \{\tau_n = \frac{k}{2^n}\} \in \mathcal{F}_{k/2^n}\) by definition of stopping time. But, given \(\{\tau_n = k/2^n\}\), the event \(\{(B_{t+k/2^n} - B_{k/2^n}) \in A\}\) is the same as \(\{B_t \in A\} = \{W_t \in A\}\), since this process is now a standard brownian motion. Thus, \(\mathbb{P}\{(B_{t+k/2^n} - B_{k/2^n}) \in A\} = \mathbb{P}\{B_t \in A\} = \mathbb{P}\{W_t \in A\}\), dropping the dependence on \(k\). The sum over \(k\) then yields:

\[ \mathbb{P}\left(\{W_t \in A\}\cap E\right) = \mathbb{P}(W_t \in A) \mathbb{P}(E) \]

as claimed. The extension to \(\tau\) is done by using continuity of paths. We have:

\[ \lim_{n \to \infty} B_{t + \tau_n} - B_{\tau_n} = B_{t+\tau} - B_{\tau} \text{ almost surely} \]

Note, that this only uses right continuity! Moreover, this implies that \(B_{t+\tau} - B_\tau\) is independent of \(\mathcal{F}_{\tau_n}\) for all \(n\). Again by (right-)continuity this extends to independence of \(\mathcal{F}_\tau\). The limiting distribution of the process is obtained by looking at the finite dimensional distributions of the increments of \(B_{t+\tau_n} - B_{\tau_n}\) for a finite number of \(t\)’s and taking the limit as above. \(\blacksquare\)

Most diffusions also enjoy the strong markov property, as long as the functions \(\sigma\) and \(\mu\) encoding the volatility and drift are nice enough. This is the case for the diffusions we have considered.

Theorem 3 (Most diffusions are strong markov) Consider a diffusion \((X_t,t\leq T)\) as as in Theorem 1. Then, the diffusion has strong markov property.

The proof follows the line of the one of Theorem 1

Proof.

Consider the time-homogenous diffusion:

\[ dX_t = \mu(X_t)dt + \sigma(X_t)dB_t \]

By the existence and uniqueness theorem, this SIVP defines a unique continuous adapted process \((X_t,t \geq 0)\). Let \(\mathfrak{F}=(\mathcal{F}_t^X,t \geq 0)\) be the natural filtration of \((X_t, t\leq T)\). Let \(\tau\) be a stopping time for the filtration \(\mathfrak{F}\) and consider the process \(W_t = B_{t+\tau} - B_\tau\). From Theorem 2, we know that the process \((W_t,t\geq 0)\) is a standard brownian motion independent \(\mathcal{F}_\tau\). For \(s \geq 0\), we consider the SDE:

\[ dY_s = \mu(Y_s)ds + \sigma(Y_s)dW_s, \quad Y_0 = X_\tau \tag{7}\]

Again by the existence and uniqueness theorem, there exists a unique solution to the SIVP that is adapted to the natural filtration of \(W\). We claim that \((X_{s+\tau},s \geq 0)\) is the solution to this equation, since:

\[ X_{s+\tau} = X_\tau + \int_\tau^{s+\tau} \mu(X_u)du + \int_{\tau}^{s+\tau} \sigma(X_u)dB_u \]

Perform a change of variable \(v = u - \tau\). Then, the limits of integration bare, \(v = 0\) and \(v = s\). And \(dv = du\).

\(dB_u \approx B_{u_2} - B_{u_1} = B(v_1 + \tau) - B(v_2 + \tau) = W(v_2) - W(v_1) =dW_v\).

\[ X_{s+\tau} = X_\tau + \int_0^{s} \mu(X_{v+\tau})dv + \int_{0}^{s} \sigma(X_{v+\tau})dW_v \]

If we let \(Y_0 = X_\tau\), \(Y_v = X_{v+\tau}\), we recover the dynamics of \((Y_v,v \geq 0)\) in Equation 7. So, \((X_{s+\tau},s\geq 0)\) is the solution to the SIVP in Equation 7. Thus, we conclude for any interval \(A\):

\[ \mathbb{P}(X_{s+\tau} \in A | \mathcal{F}_\tau^X) = \mathbb{P}(Y_v \in A| \mathcal{F}_\tau^X) \]

But, since \((Y_v,v\geq 0)\) depends on \(\mathcal{F}_\tau^X\) only through \(X_\tau\), we conclude that \(\mathbb{P}(X_{s + \tau} \in A | \mathcal{F}_\tau^X) = \mathbb{P}(X_{s + \tau} \in A| X_\tau)\). Consequently, \((X_t,t \geq 0)\) is a strong-markov process. \(\blacksquare\)

Note 2: Extension of optional sampling

Consider a continuous martingale \((M_t, t\leq T)\) for a filtration \((\mathcal{F}_t, t\geq 0)\) and a stopping time \(\tau\) for the same filtration. Suppose we would like to compute for some \(T\):

\[ \mathbb{E}[M_T \mathbf{1}_{\{\tau \leq T\}}] \]

It would be tempting to condition on \(\mathcal{F}_\tau\) and write \(\mathbb{E}[M_T |\mathcal{F}_\tau] = M_\tau\) on the event \(\{\tau \leq T\}\). We would then conclude that:

\[ \mathbb{E}[M_T 1_{\{\tau \leq T\}}] = \mathbb{E}[1_{\{\tau \leq T\}} \mathbb{E}[M_T|\mathcal{F}_\tau] ] = \mathbb{E}[M_\tau 1_{\{\tau \leq T\}}] \]

In some sense, we have extended the martingale property to stopping times. This property can be proved under reasonable assumptions on \((M_t,t\leq T)\) (for example, if it is positive). Indeed, it suffices to approximate \(\tau\) by discrete valued stopping time \(\tau_n\) as in the proof of Theorem 2. One can then apply martingale property at a fixed time.

The Heat Equation

We look at more detail on how PDEs come up when computing quantities related to Markov processes.

Example 4 (Heat Equation and Brownian motion) Let \(f(t,x)\) be a function of time and space. The heat equation in \(1+1\)-dimension (one dimension of time, one dimension of space) is the PDE:

\[ \begin{align*} \frac{\partial f}{\partial t} &= \frac{1}{2}\frac{\partial^2 f}{\partial x^2} \end{align*} \tag{8}\]

In \(1+d\) (one dimension of time, \(d\) dimensions of space), the heat equation is:

\[ \begin{align*} \frac{\partial f}{\partial t} &= \frac{1}{2}\nabla^2 f \end{align*} \tag{9}\]

where \(\nabla^2\) is the Laplacian operator.

Let \((X_t,t \geq 0)\) be a brownian motion starting at \(X_0 = x\) with probability density:

\[ f(0,x) = g(x) \tag{10}\]

where \(g\) is a function of space.

Let \(f(t,u)\) be the probability density that the process ends up at \(X_t=u\) at time \(t\). By the law of total probability, we have:

\[ \begin{align*} f(t,x) &\approx \sum_{y} \mathbb{P}\left\{X_0 = y \right\} \times \mathbb{P}\left\{X_t = x | X_0 = y \right\}\\ &= \int_{-\infty}^\infty g(y)\cdot p(x,t|y,0)dy\\ \end{align*} \]

Observe that:

\[ \begin{align*} \mathbb{E}[g(X_t)|X_0 = x] &= \int_{-\infty}^\infty g(y) \cdot p(y,t|x,0)dy\\ &=\int_{-\infty}^\infty g(y) \cdot p(x,t|y,0)dy \end{align*} \]

Thus, the function \(f\) can be represented as a specific type of space average. It can be represented as an average of \(g(B_t)\) over brownian paths starting at \(x\):

\[ f(t,x) = \mathbb{E}[g(B_t)|B_0 = x] \tag{11}\]

Our claim is that \(f\) indeed satisfies the PDE (Equation 8).

The gaussian transition probability density function (heat kernel) \(p(x,t|y,0)\) is given by:

\[ p(x,t|y,0) = \frac{1}{\sqrt{2\pi t}}\exp\left(-\frac{(x-y)^2}{2t}\right) \]

Differentiating \(p\) with respect to \(t\), we have:

\[ \begin{align*} \frac{\partial}{\partial t} p(x,t|y,0) &= \frac{\sqrt{2\pi t} \exp\left(-\frac{(x-y)^2}{2t}\right) \frac{\partial}{\partial t}\left(-\frac{(x-y)^2}{2t}\right) - \exp\left(-\frac{(x-y)^2}{2t}\right)\sqrt{2\pi}\left(\frac{1}{2\sqrt{t}}\right)}{2\pi t}\\ &=\sqrt{2\pi}\exp\left(-\frac{(x-y)^2}{2t}\right) \frac{\frac{(x-y)^2}{2t^{3/2}} - \frac{t}{2t^{3/2}}}{2\pi t}\\ &= \exp\left(-\frac{(x-y)^2}{2t}\right) \frac{(x-y)^2 - t}{\sqrt{2\pi} (2t^{5/2}) } \end{align*} \tag{12}\]

Differentiating \(p\) with respect to \(x\), we have:

\[ \begin{align*} \frac{\partial }{\partial x} p(x,t|y,0) &= \frac{1}{\sqrt{2\pi t}}\exp\left[-\frac{(x-y)^2}{2t}\right]\frac{\partial}{\partial x}\left(-\frac{(x-y)^2}{2t}\right)\\ &= \frac{1}{\sqrt{2\pi t}} \cdot \left(-\frac{1}{\cancel{2} t}\right) \exp\left[-\frac{(x-y)^2}{2t}\right] \cdot \cancel{2}(x-y)\\ &= -\frac{1}{t\sqrt{2\pi t}} (x-y)\exp\left[-\frac{(x-y)^2}{2t}\right] \end{align*} \tag{13}\]

Differentiating again with respect to space, we have:

\[ \begin{align*} \frac{\partial^2}{\partial x^2} p(x,t|y,0) &= -\frac{1}{t\sqrt{2\pi t}} \left[\exp\left\{-\frac{(x-y)^2}{2}\right\} + (x-y)\exp\left\{-\frac{(x-y)^2}{2}\right\}\left(-\frac{2(x-y)}{2y}\right)\right]\\ &=-\frac{1}{t\sqrt{2\pi t}}\exp\left\{-\frac{(x-y)^2}{2}\right\} \left[1 - \frac{(x-y)^2}{t}\right]\\ &=\frac{1}{t\sqrt{2\pi t}}\exp\left\{-\frac{(x-y)^2}{2}\right\} \left[\frac{(x-y)^2 - t}{t}\right]\\ &=\frac{1}{\sqrt{2\pi}}\exp\left\{-\frac{(x-y)^2}{2}\right\} \cdot \frac{(x-y)^2 - t}{t^{5/2}} \end{align*} \tag{14}\]

From Equation 12 and Equation 14, it follows that:

\[ \frac{\partial}{\partial t} p(x,t|y,0) = \frac{1}{2}\frac{\partial ^2}{\partial x^2} p(x,t|y,0) \]

Thus,

\[ \begin{align*} \frac{\partial}{\partial t} \int_{-\infty}^\infty g(y) p(x,t|y,0)dy &= \frac{1}{2}\frac{\partial^2}{\partial x^2} \int_{-\infty}^\infty g(y) p(x,t|y,0)dy \\ \frac{\partial }{\partial t}f(t,x) &= \frac{1}{2}\frac{\partial^2 }{\partial x^2} f(t,x) \end{align*} \]

Robert Brown’s erratic motion of pollen

In the summer of 1827, the Scottish botanist Robert Brown observed that microscopic pollen grains suspended in water move in an erratic, highly irregular, zigzag pattern. It was only in 1905, that Albert Einstein could provide a satisfactory explanation of Brownian motion. He asserted that Brownian motion originates in the continual bombardment of the pollen grains by the molecules of the surrounding water. As a result of continual collisions, the particles themselves had the same kinetic energy as the water molecules. Thus, he showed that Brownian motion provided a solution (in a certain sense) to Fourier’s famous heat equation

\[ \frac{\partial u}{\partial t}(t,x) = \kappa \frac{\partial^2 u}{\partial x^2}(t,x) \]

Albert Einstein’s proof of the existence of Brownian motion

We now summarize Einstein’s original 1905 argument. Let’s say that we are interested in the motion along the horizontal \(x\)-axis. Let’s say we drop brownian particles in a liquid. Let \(f(t,x)\) represent the number of particles per unit volume (density) at position \(x\) at time \(t\). So, the number of particles in a small interval \(I=[x,x+dx]\) of width \(dx\) will be \(f(t,x)dx\).

Now, as time progresses, the number of particles in this interval \(I\) will change. The brownian particles will zig-zag upon bombardment by the molecules of the liquid. Some particles will move out of the interval \(I\), while other particles will move in.

Let’s consider a timestep of length \(\tau\). Einstein’s probabilistic approach was to model the distance travelled by the particles or displacement of the particles as a random variable \(\Delta\). To determine how many particles end up in the interval \(I\), we start with the area to the right of the interval \(I\).

The density of particles at \(x+\Delta\) is \(f(t,x+\Delta)\); the number of particles in a small interval of length \(dx\) is \(f(t,x+\Delta)dx\). If we represent the probability density of the displacement by \(\phi(\Delta)\), then the number of particles at \(x+\Delta\) that will move to \(x\) will be \(dx \cdot f(t,x+\Delta)\phi(\Delta)\). We can apply the same logic to the left hand side. The number of particles at \(x - \Delta\) that will move to \(x\) will be \(dx \cdot f(t,x-\Delta)\phi(-\Delta)\). Assume that \(\phi(\Delta) = \phi(-\Delta)\).

Now, if we integrate these movements across the real line, then we get the number of particles at \(x\) at a short time later \(t + \tau\).

\[ f(t+ \tau,x) dx = dx \int_{-\infty}^{\infty} f(t,x+\Delta) \phi(\Delta) d\Delta \]

Now, we can get rid of \(dx\).

\[ f(t+ \tau,x) = \int_{-\infty}^{\infty} f(t,x+\Delta) \phi(\Delta) d\Delta \tag{15}\]

The Taylor’s series expansion of \(f(t+\tau,x)\) centered at \(t\) (holding \(x\) constant) is:

\[ f(t + \tau,x) = f(t,x) + \frac{\partial f}{\partial t}\tau + O(\tau^2) \]

The Taylor’s series expansion of \(f(t,x+\Delta)\) centered at \(x\) (holding \(t\) constant) is:

\[ f(t,x+\Delta) = f(t,x) + \frac{\partial f}{\partial x}\Delta + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\Delta^2 + O(\Delta^3) \]

We can now substitute these into Equation 15 to get:

\[ \begin{align*} f(t,x) + \frac{\partial f}{\partial t}\tau &= \int_{-\infty}^{\infty}\left(f(t,x) + \frac{\partial f}{\partial x}\Delta + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\Delta^2\right) \phi(\Delta)d\Delta\\ &= f(t,x) \int_{-\infty}^{\infty} \phi(\Delta)d\Delta \\ &+ \frac{\partial f} {\partial x} \int_{-\infty}^{\infty} \Delta \phi(\Delta)d\Delta \\ &+ \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\int_{-\infty}^{\infty}\Delta^2 \phi(\Delta)d\Delta \end{align*} \]

Now, since the probability distribution of displacement \(\phi(\cdot)\) is symmetric around the origin, the second term is zero. And we know, that if we integrate the density over \(\mathbb{R}\), we should get one, so the first term equals one. So, we get:

\[ f(t,x) + \frac{\partial f}{\partial t}\tau = f(t,x) + \frac{1}{2}\frac{\partial^2 f}{\partial x^2}\int_{-\infty}^{\infty}\Delta^2 \phi(\Delta)d\Delta \]

Now, we can cancel the \(f\) on both sides and then shift \(\tau\) to the right hand side:

\[ \frac{\partial f}{\partial t} = \left(\frac{1}{2\tau} \int_{-\infty}^{\infty}\Delta^2 \phi(\Delta)d\Delta \right)\frac{\partial^2 f}{\partial x^2} \]

Define \(D:= \left(\frac{1}{2\tau} \int_{-\infty}^{\infty}\Delta^2 \phi(\Delta)d\Delta \right)\). Then, we have:

\[ \frac{\partial f}{\partial t} = D\frac{\partial^2 f}{\partial x^2} \]

The microscopic interpretation of the diffusion coefficient is, that its just the average of the squared displacements. The larger the \(D\), the faster the brownian particles move.

Kolmogorov’s Backward Equation

Think of \(y\) and \(t\) as being current values and \(y'\) and \(t'\) being future values. The transition probability density function \(p(y',t'|y,t)\) of a diffusion satisfies two equations - one involving derivatives with respect to a future state and time (\(y'\) and \(t'\)) called forward equation and the other involving derivatives with respect to the current state and current time (\(y\) and \(t\)) called the backward equation. These two equations are parabolic partial differential equations not dissimilar to the Black-Scholes equation.

Theorem 4 (Backward equation with initial value) Let \((X_t,t\geq 0)\) be a diffusion in \(\mathbb{R}\) with the SDE:

\[ dX_t = \sigma(X_t)dB_t + \mu(X_t) dt \]

Let \(g\in C^2(\mathbb{R})\) be such that \(g\) is \(0\) outside an interval. Then, the solution of the PDE with initial value

\[ \begin{align*} \frac{\partial f}{\partial t}(t,x) &= \frac{\sigma(x)^2}{2}\frac{\partial^2 f}{\partial x^2} + \mu(x)\frac{\partial f}{\partial x}\\ f(0,x) &= g(x) \end{align*} \tag{16}\]

has the representation:

\[ f(t,x) = \mathbb{E}[g(X_t)|X_0 = x] \]

Proof.

Step 1. Let’s fix \(t\) and consider the function of space \(h(x)=f(t,x)=\mathbb{E}[g(X_t)|X_0=x]\). Applying Ito’s formula to \(h\), we have:

\[\begin{align} dh(X_s) &= h'(X_s) dX_s + \frac{1}{2}h''(X_s) (dX_s)^2\\ &= h'(X_s) (\sigma(X_s)dB_s + \mu(X_s) ds) + \frac{\sigma(X_s)^2}{2}h''(X_s)ds\\ &= \sigma(X_s)h'(X_s)dB_s + \left(\frac{\sigma(X_s)^2}{2}h''(X_s) + \mu(X_s)h'(X_s)\right)ds \end{align}\]

In the integral form this is:

\[\begin{align*} h(X_s) - h(X_0) &= \int_0^s \sigma(X_u)h'(X_u)dB_u \\ &+ \int_0^s \left(\frac{\sigma(X_u)^2}{2}h''(X_u) + \mu(X_u)h'(X_u)\right)du \tag{1} \end{align*}\]

Step 2. Take expectations on both sides, divide by \(s\) and let \(s \to 0\). We are interested in taking the derivative with respect to \(s\) at \(s_0=0\).

The expectation of the first term on the right hand side is zero, by the properties of the Ito integral.

The integrand of the second term (RHS) is a conditional expectation \(\mathbb{E}[\xi(X_u)|X_0 = x]\), it is an average at time \(u\), of the paths of the process starting at initial position \(X_0 = x\), so it is a function of \(u\) and \(x\). So, \(\mathbb{E}[\xi(X_u)|X_0 = x] = p(u,x)\). Suppressing the argument \(x\), we have the representation:

\[\begin{align} \int_0^s p(u) du \end{align}\]

Recall that, if \(p\) is a continuous function, then it is Riemann integrable. Further, since integration and differentiation are inverse operations, there exists a unique antiderivative \(P\) given by

\[ P(s) = \int_{0}^{s}p(u)du \]

satisfying \(P'(0) = p(0)\).

By the definition of the derivative:

\[P'(0) = \lim_{s \to 0} \frac{P(s) - P(0)}{s} = \lim_{s\to 0} \frac{P(s)}{s} = p(0) \quad \{ P(0)=0 \text{ by definition }\}\]

Thus, we have:

\[ p(0,x) = \mathbb{E}[\xi(X_0)|X_0 = x] = \frac{\sigma(x)^2}{2} h''(x) + \mu(x)h'(x) \]

Step 3. As for the left-hand side, we have:

\[ \lim_{s \to 0} \frac{\mathbb{E}[h(X_s)|X_0 = x] - h(X_0)}{s} = \lim_{s \to 0} \frac{\mathbb{E}[h(X_s)|X_0 = x] - f(t,x)}{s} \]

To prove that this limit is \(\frac{\partial f}{\partial t}(t,x)\), it remains to show that \(\mathbb{E}[h(X_s)|X_0 = x]=\mathbb{E}[g(X_{t+s})|X_0 = x]=f(t+s,x)\).

To see this, note that \(h(X_s) = \mathbb{E}[g(X_{t+s})|X_s]\). We deduce:

\[\begin{align*} \mathbb{E}[h(X_s)|X_0 = x] &= \mathbb{E}[\mathbb{E}[g(X_{t+s})|X_s]|X_0 = x]\\ &= \mathbb{E}[\mathbb{E}[g(X_{t+s})|\mathcal{F}_s]|X_0 = x]\\ & \{ (X_t,t\geq 0) \text{ is Markov }\} \\ &= \mathbb{E}[g(X_{t+s})|X_0 = x]\\ & \{ \text{ Tower property }\} \\ &= f(t+s,x) \end{align*}\]

This closes the proof. \(\blacksquare\)

The backward equation (Equation 16) can be conveniently written in terms of the generator of the diffusion.

Definition 4 (Generator of a diffusion) The generator of a diffusion with SDE \(dX_t = \sigma(X_t) dB_t + \mu(X_t)dt\) is the differential operator acting on functions of space defined by :

\[ A = \frac{\sigma(x)^2}{2}\frac{\partial }{\partial x^2} + \mu(x)\frac{\partial}{\partial x} \]

With this notation, the backward equation for the function \(f(t,x)\) takes the form:

\[ \frac{\partial f}{\partial x}(t,x) = Af(t,x) \]

where it is understood that \(A\) acts only on the space variable. Theorem 4 gives a nice interpretation of the generator: it quantifies how much the function \(f(t,x) = \mathbb{E}[g(X_t)|X_0 = x]\) changes in a small time interval.

Example 5 (Generator of the Ornstein Uhlenbeck Process) The SDE of the Ornstein-Uhlenbeck process is:

\[ dX_t = dB_t - X_t dt \]

This means that its generator is:

\[ A = \frac{1}{2}\frac{\partial^2}{\partial x^2} - x \frac{\partial}{\partial x} \]

Example 6 (Generator of Geometric Brownian Motion) Recall that the geometric Brownian motion

\[ S_t = S_0 \exp(\sigma B_t + \mu t) \]

satisfies the SDE:

\[ dS_t = \sigma S_t dB_t + \left(\mu + \frac{\sigma^2}{2}\right) S_t dt \]

In particular, the generator of geometric Brownian motion is :

\[ A = \frac{\sigma^2 x^2}{2} x \frac{\partial^2}{\partial x^2} + \left(\mu + \frac{\sigma^2}{2}\right)\frac{\partial}{\partial x} \]

For applications, in particular in mathematical finance, it is important to solve the backward equation with terminal value instead of with initial value. The reversal of time causes the appearance of an extra minus sign in the equation.

Theorem 5 (Backward equation with terminal value) Let \((X_t,t\leq T)\) be a diffusion with the dynamics:

\[ dX_t = \sigma(X_t) dB_t + \mu(X_t)dt \]

Let \(g\in C^2(\mathbb{R})\) be such that \(g\) is \(0\) outside an interval. Then, the solution of the PDE with terminal value at time \(T\)

\[ \begin{align*} -\frac{\partial f}{\partial t} &= \frac{\sigma(x)^2}{2}\frac{\partial^2 f}{\partial x^2} + \mu(x)\frac{\partial f}{\partial x}\\ f(T,x) &= g(x) \end{align*} \tag{17}\]

has the representation:

\[ f(t,x) = \mathbb{E}[g(X_T)|X_t = x] \]

Note 3: Backward equation with terminal value appears in the martingale condition

One way to construct a martingale for the filtration \((\mathcal{F}_t,t\geq 0)\) is to take

\[ M_t = \mathbb{E}[Y | \mathcal{F}_t] \]

where \(Y\) is some integrable random variable. The martingale property then follows from the tower property of the conditional expectation. In the setup of Theorem 5, the random variable \(Y\) is \(g(X_T)\). By the Markov property of diffusion, we therefore have:

\[ f(t,X_t) = \mathbb{E}[g(X_T)|X_t] = \mathbb{E}[g(X_T)|\mathcal{F}_t] \]

In other words, the solution to the backward equation with terminal value evaluated at \(X_t = x\) yields a martingale for the natural filtration of the process. This is a different point of view on the procedure we have used many times now: To get a martingale of the form \(f(t,X_t)\), apply the Ito’s formula to \(f(t,X_t)\) and set the \(dt\) term to zero. The PDE we obtain is the backward equation with terminal value. In fact, the proof of the theorem takes this exact route.

Proof.

Consider \(f(t,X_t)\) and apply Ito’s formula.

\[ \begin{align*} df(t,X_t) &= \frac{\partial f}{\partial t} dt + \frac{\partial f}{\partial x}dX_t + \frac{1}{2}\frac{\partial^2 f}{\partial x^2} dX_t \cdot dX_t\\ &= \frac{\partial f}{\partial t} dt + \frac{\partial f}{\partial x}(\sigma(X_t) dB_t + \mu(X_t)dt) + \frac{\sigma(X_t)^2}{2}\frac{\partial^2 f}{\partial x^2} dt\\ &= \sigma(X_t) dB_t + \left(\frac{\partial f}{\partial t} + \frac{\sigma(X_t)^2}{2}\frac{\partial^2 f}{\partial x^2} + \mu(X_t)\frac{\partial f}{\partial x}\right)dt \end{align*} \]

Since \(f(t,x)\) is a solution to the equation, we get that the \(dt\) term is \(0\) and \(f(t,X_t)\) is a martingale for the Brownian filtration (and thus also for the natural filtration of the diffusion, which contains less information). In particular we have:

\[ f(t,X_t) = \mathbb{E}[f(T,X_T)|\mathcal{F}_t] = \mathbb{E}[g(X_T)|\mathcal{F}_t] \]

Since \((X_t,t\leq T)\) is a Markov process, we finally get:

\[ f(t,x) = \mathbb{E}[g(X_T)|X_t = x] \]

Example 7 (Martingales of geometric Brownian motion) Let \((S_t, \geq 0)\) be a geometric brownian motion with SDE:

\[ dS_t = \sigma S_t dB_t + \left(\mu + \frac{\sigma^2}{2}\right)dt \]

As we saw in Example 6, its generator is:

\[ A = \frac{\sigma^2 x^2}{2}\frac{\partial^2}{\partial x^2} + x\left(\mu+\frac{\sigma^2}{2}\right)\frac{\partial}{\partial x} \]

In view of Theorem 5, if \(f(t,x)\) satisfies the PDE

\[ \frac{\partial f}{\partial t} + \frac{\sigma^2 x^2}{2}\frac{\partial^2 f}{\partial x^2} + x\left(\mu+\frac{\sigma^2}{2}\right)\frac{\partial f}{\partial x} \]

then processes of the form \(f(t,S_t)\) will be martingales for the natural filtration.

Kolmogorov’s forward equation

The companion equation to the backward equation is the Kolmogorov forward equation or forward equation. It is also known as the Fokker-Planck equation from its physics origin. The equation is very useful as it is satisfied by the transition density function \(p(y',t'|y,t)\) of a time-homogenous diffusion. It involves the adjoint of the generator.

Definition 5 (Adjoint of the generator) The adjoint \(A^*\) of the generator of a diffusion \((X_t,t\geq 0)\) with SDE:

\[ dX_t = \sigma(X_t)dB_t + \mu(X_t)dt \]

is the differential operator acting on a function of space \(f(x)\) as follows:

\[ A^*f(x) = \frac{1}{2}\frac{\partial^2 }{\partial x^2} \frac{\sigma(x)^2}{2} f(x) - \frac{\partial }{\partial x}\mu(x)f(x) \tag{18}\]

Note the differences with the generator in Definition 4: there is an extra minus sign and the derivatives also act on the volatility and the drift.

Example 8 (The generator of Brownian motion is self-adjoint) In the case of standard brownian motion, it is easy to check that:

\[ A^* = \frac{1}{2}\frac{\partial^2}{\partial x^2} \]

and

\[ A^* = \frac{1}{2}\nabla^2 \]

in the multivariate case. In other words, the generator and its adjoint are the same. In this case, the operator is self-adjoint.

Example 9 We see that the adjoint of the generator acting on \(f(x)\) for geometric Brownian motion is:

\[ A^*f(x) = \frac{1}{2}\frac{\partial^2}{\partial x^2} (\sigma^2 x^2 f(x)) - \frac{\partial}{\partial x} \left(\left(\mu + \frac{\sigma^2}{2}\right) x f(x)\right) \]

Using the product rule in differentiating we get:

\[ A^*[f(x)] = \frac{\sigma^2}{2}\left(2x f(x) + x^2 f''(x)\right) - \left(\left(\mu + \frac{\sigma^2}{2}\right)\left(f(x) + x f'(x)\right)\right) \]

Example 10 The generator for the Ornstein-Uhlenbeck process was given in Example 5. The adjoint acting on \(f\) is therefore:

\[ \begin{align*} A^*f(x) &= \frac{1}{2}\frac{\partial^2}{\partial x^2}(f(x)) - \frac{\partial}{\partial x}(- x f(x))\\ &= \frac{f''(x)}{2} + (f(x)+xf'(x)) \end{align*} \]

The forward equation takes the following form for a function \(f(t,x)\) of time and space:

\[ \frac{\partial f}{\partial t} = A^* f \tag{19}\]

For brownian motion, since \(A^* = A\), the backward and forward equations are the same. As advertised earlier, the forward equation is satisfied by the transition \(p_t(y',t'|y,t)\) of a diffusion. Before showing this in general, we verify it in the Brownian case.

Example 11 Recall that the transition probability density \(p(y,t|x,0)\) for Brownian motion, or heat kernel, is:

\[ p(y,t|x,0) = \frac{e^{-\frac{(y-x)^2}{2}}}{\sqrt{2\pi t}} \]

Here, the space variable will be \(y\) and \(x\) will be fixed. The relevant function is thus \(f(t,y) = p(y,t|x,0)\). The adjoint operator acting on the space variable \(y\) is \(A^* = A = \frac{1}{2}\frac{\partial^2}{\partial y^2}\). The relevant time and space derivatives are given by Equation 12 and Equation 14.

We conclude that \(f(t,y)=p(y,t|x,0)\) is a solution of the forward equation.

Where does the form of the adjoint operator Equation 18 come from? In some sense, the adjoint operator plays a role similar to that of the transpose of a matrix in linear algebra. The adjoint acts on the function on the left. To see this, consider two functions \(f,g\) of space on which the generator \(A\) of a diffusion is well-defined. In particular, let’s assume that the functions are zero outside an interval. Consider the quantity

\[ \int_{\mathbb{R}}g(x)A(f(x))dx = \int_{\mathbb{R}} g(x)\left(\frac{\sigma(x)^2 }{2}f''(x) + \mu(x)f'(x)\right)dx \]

This quantity can represent for example the average of \(Af(x)\) over some PDF \(g(x)\). In the above, \(A\) acts on the function on the right. To make the operator act on \(g\), we integrate by parts. This gives for the second term:

\[ \int_{\mathbb{R}} g(x)\mu(x)f'(x)dx = g(x)\mu(x)f(x)\Bigg|_{-\infty}^{\infty}-\int_{\mathbb{R}}f(x)\frac{d}{dx}(g(x)\mu(x))dx \]

The boundary term \(g(x)f(x)\mu(x)\Bigg|_{-\infty}^\infty\) is \(0\) by the assumptions on \(f,g\). This term on \(\sigma\) is obtained by integrating by parts twice:

\[ \begin{align*} \int_{\mathbb{R}} g(x) \frac{\sigma(x)^2}{2}f''(x)dx &= g(x) \frac{\sigma(x)^2}{2}f'(x)\Bigg|_{-\infty}^{\infty} - \int_{\mathbb{R}}\frac{d}{dx}\left(g(x) \frac{\sigma(x)^2}{2}\right) f'(x)dx\\ -\int_{\mathbb{R}} \frac{d}{dx}\left(g(x) \frac{\sigma(x)^2}{2}\right)f'(x)dx &= -\frac{d}{dx}\left(g(x) \frac{\sigma(x)^2}{2}\right)f(x) \Bigg|_{-\infty}^{\infty} + \int_{\mathbb{R}}\frac{d^2}{dx^2}\left(g(x) \frac{\sigma(x)^2}{2}\right)f(x)dx \end{align*} \]

Thus,

\[ \begin{align*} \int_{\mathbb{R}}g(x) Af(x)dx &= \int_{\mathbb{R}}\left(\frac{1}{2}\frac{d^2}{dx^2}(g(x) \sigma(x)^2) - \frac{d}{dx}(g(x)\mu(x))\right)f(x)dx\\ &= \int_{\mathbb{R}}(A^*g(x))f(x)dx \end{align*} \tag{20}\]

Theorem 6 (Forward equation and transition probability) Let \((X_t,t\geq 0)\) be a diffusion with SDE:

\[ dX_t = \sigma(X_t)dB_t + \mu(X_t)dt, \quad X_0 = x_0 \]

Let \(p(x,t|x_0,0)\) be the transition probability density function for a fixed \(x_0\). Then, the function \(f(t,y) = p(y,t|x_0,0)\) is a solution of the PDE

\[ \frac{\partial f}{\partial t} = A^* f \]

where \(A^*\) is the adjoint of \(A\).

Proof.

Let \(h(x)\) be some arbitrary function of space that is \(0\) outside an interval. We compute :

\[ \frac{1}{\epsilon}(\mathbb{E}[h(X_{t+\epsilon}) - \mathbb{E}[h(X_t)]]) \]

two different ways and take the limit as \(\epsilon \to 0\).

On one hand, we have by the definition of the transition density

\[ \frac{1}{\epsilon}\left(\mathbb{E}[h(X_{t+\epsilon})]-\mathbb{E}[h(X_t)]\right) = \int_{\mathbb{R}}\frac{1}{\epsilon}(p(x,t+\epsilon|x,0) - p(x,t|x_0,0))h(x)dx \]

By taking the limit \(\epsilon \to 0\) inside the integral (assuming this is fine), we get:

\[ \int_{\mathbb{R}} \frac{\partial}{\partial t}p(x,t|x_0,0)h(x)dx \tag{21}\]

On the other hand, Ito’s formula implies

\[ \begin{align*} dh(X_s) &= \frac{\partial h}{\partial x} dX_s + \frac{1}{2} \frac{\partial^2 h}{\partial x^2} (dX_s)^2\\ &= \frac{\partial h}{\partial x} (\sigma(X_s) dB_s + \mu(X_s)ds) + \frac{1}{2} \frac{\partial^2 h}{\partial x^2} (\sigma(X_s)^2 ds)\\ &= \sigma(X_s)\frac{\partial h}{\partial x} dB_s + \left(\mu(X_s) \frac{\partial h}{\partial x} + \frac{\sigma(X_s)^2}{2}\frac{\partial^2 h}{\partial x^2}\right)ds\\ h(X_{t+\epsilon}) - h(X_t) &= \int_{t}^{t+\epsilon}\sigma(X_s)\frac{\partial h}{\partial x} dB_s + \int_{t}^{t+\epsilon}(Ah(x))ds\\ \mathbb{E}[h(X_{t+\epsilon})] - \mathbb{E}[h(X_t)] &= \underbrace{\mathbb{E}\left[\int_{t}^{t+\epsilon}\sigma(X_s)\frac{\partial h}{\partial x} dB_s\right]}_{0} + \int_{t}^{t+\epsilon}\mathbb{E}[Ah(X_s)]ds \end{align*} \]

Dividing by \(\epsilon\) and taking the limit as \(\epsilon \to 0\), we have:

\[ \begin{align*} \lim_{\epsilon \to 0} \frac{1}{\epsilon} (\mathbb{E}[h(X_{t+\epsilon})] - \mathbb{E}[h(X_t)]) &= \mathbb{E}[Ah(X_t)]\\ &= \int_{\mathbb{R}} p(x,t|x_0,0) Ah(x) dx \end{align*} \]

This can be written using Equation 20 as,

\[ \int_{\mathbb{R}}(A^* p(x,t|x_0,0)) h(x) dx \]

Since \(h\) is arbitrary, we conclude that:

\[ \frac{\partial}{\partial t}p(x,t|x_0,0) = A^* p(x,t|x_0,0) \tag{22}\]

Example 12 (Forward equation and invariant probability.) The Ornstein-Uhlenbeck process converges to a stationary distribution as noted in the example here. For example, for the SDE of the form

\[ dX_t = -X_t dt + dB_t \]

with \(X_0\) a Gaussian of mean \(0\) and variance \(1/2\), the PDF of \(X_t\), is, for all \(t\) is:

\[ f(x) = \frac{1}{\sqrt{\pi}} e^{-x^2} \tag{23}\]

This invariant distribution can be seen from the point of view of the forward equation. Indeed since the PDF is constant in time, the forward equation simply becomes:

\[ A^* f = 0 \tag{24}\]

Example 13 The SDE of the Ornstein-Uhlenbeck process can be generated as follows. Consider \(V(x)\), a smooth function of space such that \(\int_{\mathbb{R}} e^{-2V(x)}dx<\infty\). The Smoluchowski equation is the SDE of the form:

\[ dX_t = dB_t - V'(X_t) dt \tag{25}\]

The SDE can be interpreted as follows: \(X_t\) represents the position of a particle on \(\mathbb{R}\). The position varies due to the Brownian fluctuations and also due to a force \(V'(X_t)\) that depends on the position. The function \(V(x)\) should then be thought of as the potential with which the particle moves, since the force (field) is the (negative) derivative of the potential function in Newtonian physics. The generator of this diffusion is:

\[ A = \frac{1}{2}\frac{\partial^2}{\partial x^2} - V'(x)\frac{\partial}{\partial x} \]

This diffusion admits an invariant distribution :

\[ f(x) = Ce^{-2V(x)} \]

where \(C\) is such that \(\int_{\mathbb{R}}f(x)dx = 1\).

The Feynman-Kac Formula

We saw in Example 4 that the solution of the heat equation:

\[ \frac{\partial f}{\partial t} = \frac{1}{2}\frac{\partial^2 f}{\partial x^2} \]

can be represented as an average over Brownian paths. This representation was extended to diffusions in theorem Theorem 4 where the second derivative in the equation is replaced by the generator of the corresponding diffusion. How robust is this representation? In other words, is it possible to slightly change the PDE and still get a stochastic representation representation for the solution? The answer to this question is yes, when a term of the form \(r(x)f(t,x)\) is added to the equation, where \(r(x)\) is a well-behaved function of space (for example, piecewise continuous). The stochastic representation of the PDE in this case bears the name Feynman-Kac formula, making a fruitful collaboration between the physicist Richard Feynman and the mathematician Mark Kac. By the way, you pronounce “Kac” as “cats”. His name is Polish. People who immigrated from Poland before him spelled their names as “Katz”. The case when \(r(x)\) is linear will be important in the applications to mathematical finance, where it represents the contribution of the interest rate.

Theorem 7 (Initial Value Problem) Let \((X_t,t\geq 0)\) be a diffusion in \(\mathbb{R}\) with the SDE:

\[ dX_t = \sigma(X_t) dB_t + \mu(X_t)dt \]

Let \(g\in C^2(\mathbb{R})\) be such that \(g\) is \(0\) outside an interval. Then, the solution of the PDE with initial value

\[ \begin{align*} \frac{\partial f}{\partial t}(t,x) &= \frac{\sigma(x)^2}{2}\frac{\partial^2 f}{\partial x^2}(t,x) + \mu(x)\frac{\partial f}{\partial x}(t,x) - r(x)f(x)\\ f(0,x) &= g(x) \end{align*} \tag{26}\]

has the stochastic representation:

\[ f(t,x) = \mathbb{E}\left[g(X_t)\exp\left(-\int_0^t r(X_s) ds\right)\Bigg| X_0 = x\right] \]

Proof.

The proof is again based on Ito’s formula. For a fixed \(t\), we consider the process:

\[ M_s = f(t-s, X_s) \exp\left(-\int_0^s r(X_u) du\right), \quad s \leq t \]

Write \(Z_s = \exp\left(-\int_0^s r(X_u) du\right)\) and \(V_s = f(t-s,X_s)\). A direct application of Ito’s formula yields:

Let \(R_s = -\int_0^s r(X_u) du\). So, \(dR_t = r(X_t) dt\). \((R_t,t\geq 0)\) is a random variable, because \(r(X_s)\) depends on how \((X_s, s \leq t)\) evolves, it is stochastic, but for very small intervals of time \(r(X_s)\) is a constant, and hence the process \((R_t,t\geq 0)\) is said to be locally deterministic.

\[ \begin{align*} Z_s &= e^{-R_s}\\ dZ_s &= -e^{-R_s} dR_s + \frac{1}{2}e^{R_s} (dR_s)^2\\ &= -Z_s r(X_s) ds \end{align*} \]

and

\[ \begin{align*} dV_s &= \frac{\partial}{\partial s}f(t-s, X_s)ds + \frac{\partial}{\partial x}f(t-s, X_s)dX_s + \frac{1}{2}\frac{\partial^2}{\partial x^2}f(t-s,X_s)(dX_s)^2\\ &= -f_s ds + f_x (\sigma(X_s)dB_s + \mu(X_s)ds) + \frac{1}{2}f_{xx} \sigma(X_s)^2 ds \\ &= \sigma(X_s) f_x dB_s + \\ &+ \left\{-f_s + \mu(X_s)f_x + \frac{\sigma(X_s)^2}{2}f_{xx}\right\}ds \end{align*} \]

Recall that \(t\) is fixed here, and we differentiate with respect to \(s\) in time. Since \(f(t,x)\) is a solution of the PDE, we can write the second equation as:

\[ dV_s = \sigma(X_s) f_x dB_s + r(X_s) f(t-s,X_s)ds \]

Now, by Ito’s product rule, we finally have:

\[ \begin{align*} dM_s &= V_s dZ_s + Z_s dV_s + dZ_s dV_s\\ &= -f(t-s,X_s)Z_s r(X_s) ds + Z_s (\sigma(X_s) f_x dB_s + r(X_s) f(t-s,X_s)ds) + 0\\ &= \sigma(X_s)Z_s f_x dB_s \end{align*} \]

This proves that \((M_s, s \leq t)\) is a martingale. We conclude that:

\[ \mathbb{E}[M_t] = \mathbb{E}[M_0] \]

Using the definition of \(M_t\), this yields:

\[ \mathbb{E}[M_t] = \mathbb{E}\left[f(0,X_t)\exp\left(-\int_0^t r(X_u) du\right)\right] = \mathbb{E}\left[g(X_t)\exp\left(-\int_0^t r(X_u) du\right)\right] = \mathbb{E}[M_0] = f(t,x) \]

This proves the theorem. \(\blacksquare\)

As for the backward equation, it is natural to consider the terminal value problem for the same PDE.

Theorem 8 (Terminal Value Problem) Let \((X_t,t \leq T)\) be a diffusion in \(\mathbb{R}\) with the SDE:

\[ dX_t = \sigma(X_t) dB_t + \mu(X_t) dt \]

Let \(g\in C^2(\mathbb{R})\) be such that \(g\) is \(0\) outside an interval. Then, the solution of the PDE with initial value

\[ \begin{align*} -\frac{\partial f}{\partial t}(t,x) &= \frac{\sigma(x)^2}{2}\frac{\partial^2 f}{\partial x^2}(t,x) + \mu(x)\frac{\partial f}{\partial x}(t,x) - r(x)f(t,x)\\ f(T,x) &= g(x) \end{align*} \]

has the stochastic representation :

\[ f(t,x) = \mathbb{E}\left[g(X_T)\exp\left(-\int_t^T r(X_u) du\right)\Bigg|X_t = x\right] \]

Proof.

The proof is similar by considering instead

\[ M_t = f(t,X_t)\exp\left(-\int_0^t r(X_u) du\right) \]

Numerical Projects

Temperature of a rod

Consider the initial function \(g(x) = 1 - |x|\) for \(|x| \leq 1\) and \(0\) if \(|x| > 1\). This function may represent the temperature of a rod at time \(0\).

  1. Approximate the solution \(f(t,x)\) to the heat equation at time \(t=0.25\) at every \(0.01\) in \(x\) using the representation Equation 11. Use a sample of \(100\) paths for each \(x\).

Solution.

import numpy as np


# initial temperature of the rod as a function of position x
def g(x):
    if x >= -1.0 and x <= 1.0:
        return 1.0 - np.abs(x)
    else:
        return 0.0


# helper function to generate brownian paths starting at B_0 = x_0
def brownian_paths(num_paths, step_size, t_0, t_n, x_0):
    num_steps = int((t_n - t_0) / step_size)
    db_t = np.sqrt(step_size) * np.random.standard_normal(size=(num_paths, num_steps))
    db_t = np.concatenate([np.full((num_paths, 1), x_0), db_t], axis=1)
    b_t = np.cumsum(db_t, axis=1)
    return b_t


x = np.linspace(-5.0, 5.0, 1001)  # space variable
t = np.linspace(0.0, 1.0, 101)  # time variable

Now, let’s use the data from the problem to compute the specific space average.

# generate sample paths
paths = brownian_paths(num_paths=100, step_size=0.01, t_0=0.0, t_n=1.0, x_0=0.0)

# look at the value of B(t) at t=0.25 and average them

Exercises

Exercise 1 (Shifted Brownian Motion) Let \((B_t,t\geq 0)\) be a standard brownian motion. Fix \(t > 0\). Show that the process \((W_s,s \geq 0)\) with \(W_s = B_{t+s} - B_t\) is a standard brownian motion independent of \(\mathcal{F}_t\).

Solution.

At \(s = 0\), \(W(0) = B(t) - B(t) = 0\).

Consider any arbitrary times \(t_1 < t_2\). We have:

\[\begin{align*} W(t_2) - W(t_1) &= (B(t + t_2) - B(t)) - (B(t + t_1) - B(t))\\ &= B(t + t_2) - B(t + t_1) \end{align*}\]

Now, \(B(t + t_2) - B(t + t_1) \sim \mathcal{N}(0,t_2 - t_1)\). So, \(W(t_2) - W(t_1)\) is a Gaussian random variable with mean \(0\) and variance \(t_2 - t_1\).

Finally, consider any finite set of times \(0=t_0 < t_1 < t_2 < \ldots < t_n = T\). Then, \(t < t + t_1 < t + t_2 < \ldots < t + t_n\). We have that, \(B(t + t_1) - B(t)\), \(B(t + t_2) - B(t + t_1)\), \(B(t + t_3) - B(t + t_2)\), \(\ldots\), \(B(t+T) - B(t+t_{n-1})\) are independent random variables. Consequently, \(W(t_1) - W(0)\), \(W(t_2) - W(t_1)\), \(W(t_3) - W(t_2)\), \(\ldots\), \(W(t_n) - W(t_{n-1})\) are independent random variables. So, \((W_s,s\geq 0)\) is a standard brownian motion.

Also, we have:

\[\begin{align*} \mathbb{E}[W(s)|\mathcal{F}_t] &= \mathbb{E}[B(t + s) - B(t)|\mathcal{F}_t]\\ & \{ B(t+s) - B(t) \perp \mathcal{F}_t \}\\ &= \mathbb{E}[B(t + s) - B(t)]\\ &= \mathbb{E}[W(s)] \end{align*}\]

Thus, \(W(s)\) is independent of \(\mathcal{F}_t\), it does not depend upon the information available upto time \(t\).