quantdev.blog - Positive Definiteness

Positive Definite Matrices

Definition 1. A real-valued symmetric matrix \(A\) is positive-definite (written as \(A \succ 0\)), if for every real valued vector \(\mathbf{x} \in \mathbf{R}^n \setminus \{\mathbf{0}\}\), the quadratic form

\[\begin{align*} q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} > 0 \end{align*}\]

The real-value symmetric matrix \(A\) is positive semi-definite (written as \(A \succeq 0\)) if :

\[\begin{align*} q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x} \geq 0 \end{align*}\]

\(\forall \mathbf{x} \in \mathbf{R}^n \setminus \{\mathbf{0}\}\).

Intuitively, positive definite matrices are like strictly positive real numbers. Consider a scalar \(a > 0\). The sign of \(ax\) will depend on the sign of \(x\). And \(x\cdot ax > 0\). However, if \(a < 0\), multiplying it with \(x\) will flip the sign, so \(x\) and \(ax\) have opposite signs and \(x \cdot ax < 0\).

If \(A \succ 0\), then by definition \(\mathbf{x}^T A \mathbf{x} > 0\) for all \(\mathbf{x} \in \mathbf{R}^n\). Thus, the vector \(\mathbf{x}\) and it’s transformed self \(A\mathbf{x}\) should make an angle \(\theta \in \left[-\frac{\pi}{2},\frac{\pi}{2}\right]\). Both \(\mathbf{x}\) and \(A\mathbf{x}\) lie on the same side of the hyperplane \(\mathbf{x}^{\perp}\).

%load_ext itikz

Show the code

%%itikz --temp-dir --tex-packages=tikz --tikz-libraries=arrows --implicit-standalone
\begin{tikzpicture}[scale=1.5]
\draw [->] (-2,0) -- (6,0) node [below] {\huge $x_1$};
\draw [->] (0,-2) -- (0,6) node [above] {\huge $x_2$};
\draw [->] (0,0) -- (3,1) node [above] {\huge $\mathbf{x}$};
\draw [->] (0,0) -- (2,5) node [above] {\huge $A\mathbf{x}$};
\draw [dashed] (-2,6) -- (1,-3) node [] {\huge $\mathbf{x}^{\perp}$};
\draw[draw=green!50!black,thick] (1,1/3) arc (18.43:68.19:1) node [midway,above] {\huge $\theta$};
\end{tikzpicture}

Convex functions

There is a second geometric way to think about positive definite matrices : a quadratic form is convex when the matrix is symmetric and positive definite.

Definition 2. (Convex function) A function \(f:\mathbf{R}^n \to \mathbf{R}\) is convex, if for all \(\mathbf{x},\mathbf{y}\in \mathbf{R}^n\) and \(0 \leq \lambda \leq 1\), we have:

\[\begin{align*} f(\lambda \mathbf{x} + (1-\lambda)\mathbf{y}) \leq \lambda f(\mathbf{x}) + (1-\lambda) f(\mathbf{y}) \tag{1} \end{align*}\]

Proposition 3. Assume that the function \(f:\mathbf{R}^n \to \mathbf{R}\) is differentiable. Then, \(f\) is convex, if and only if, for all \(\mathbf{x},\mathbf{y} \in \mathbf{R}^n\), the inequality

\[\begin{align*} f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{y})^T (\mathbf{y} - \mathbf{x}) \tag{2} \end{align*}\]

is satisfied.

Proof.

\(\Longrightarrow\) direction.

Assume that \(f\) is convex and let \(\mathbf{x} \neq \mathbf{y} \in \mathbf{R}^n\). The convexity of \(f\) implies that:

\[\begin{align*} f((\mathbf{x} + \mathbf{y})/2) \leq \frac{1}{2}f(\mathbf{x}) + \frac{1}{2}f(\mathbf{y}) \end{align*}\]

Denote now \(\mathbf{h} = \mathbf{y}-\mathbf{x}\). Then this inequality reads:

\[\begin{align*} f(\mathbf{x} + \mathbf{h}/2) \leq \frac{1}{2} f(\mathbf{x}) + \frac{1}{2}f(\mathbf{x} + \mathbf{h}) \end{align*}\]

Using elementary transformations, we have:

\[\begin{align*} \frac{f(\mathbf{x} + \mathbf{h}/2)}{1/2} &\leq f(\mathbf{x}) + f(\mathbf{x} + \mathbf{h}) \\ f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) &\geq \frac{f(\mathbf{x} + \mathbf{h}/2) - f(\mathbf{x})}{1/2} \end{align*}\]

Repeating this line of argumentation:

\[\begin{align*} f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) \geq \frac{f(\mathbf{x} + \mathbf{h}/2) - f(\mathbf{x})}{1/2} \geq \frac{f(\mathbf{x} + \mathbf{h}/4) - f(\mathbf{x})}{1/4} \end{align*}\]

Consequently,

\[\begin{align*} f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) \geq \frac{f(\mathbf{x} + 2^{-k}\mathbf{h}) - f(\mathbf{x})}{2^{-k}} \tag{2} \end{align*}\]

By the order limit theorem,

\[\begin{align*} f(\mathbf{x} + \mathbf{h}) - f(\mathbf{x}) \geq \lim_{k \to \infty}\frac{f(\mathbf{x} + 2^{-k}\mathbf{h}) - f(\mathbf{x})}{2^{-k}} = D_{\mathbf{h}}f(\mathbf{x}) = \nabla f(\mathbf{x}) \cdot (\mathbf{y} - \mathbf{x}) \end{align*}\]

Replacing \(\mathbf{y}-\mathbf{x}\) by \(\mathbf{h}\), we have:

\[\begin{align*} f(\mathbf{y}) \geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T (\mathbf{y} - \mathbf{x}) \end{align*}\]

\(\Longleftarrow\) direction.

Let \(\mathbf{w}, \mathbf{z} \in \mathbf{R}^n\). Moreover, denote:

\[\begin{align*} \mathbf{x} := \lambda \mathbf{w} + (1-\lambda)\mathbf{z} \end{align*}\]

Then, the inequality in (1) implies that:

\[\begin{align*} f(\mathbf{w}) &\geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T (\mathbf{w} - \mathbf{x})\\ f(\mathbf{z}) &\geq f(\mathbf{x}) + \nabla f(\mathbf{x})^T (\mathbf{z} - \mathbf{x}) \tag{3} \end{align*}\]

Note moreover that:

\[\begin{align*} \mathbf{w} - \mathbf{x} &= (1-\lambda)(\mathbf{w}-\mathbf{z})\\ \mathbf{z} - \mathbf{x} &= \lambda(\mathbf{z}-\mathbf{w}) \end{align*}\]

Thus, if we multiply the first line in (3) with \(\lambda\) and the second line with \((1-\lambda)\) and then add the two inequalities, we obtain:

\[\begin{align*} \lambda f(\mathbf{w}) + (1-\lambda)f(\mathbf{z}) &\geq f(\mathbf{x}) + \nabla f(\mathbf{x})[\lambda(1-\lambda)(\mathbf{w} - \mathbf{z} + \mathbf{z} - \mathbf{w})\\ &=f(\lambda \mathbf{w} + (1-\lambda)\mathbf{z}) \end{align*}\]

Since \(\mathbf{w}\) and \(\mathbf{z}\) were arbitrary, this proves the convexity of \(f\).

The convexity of a differentiable function can either be characterized by the fact that all secants lie above the graph or that all tangents lie below the graph.

We state the next corollary without proof.

Corollary 4. Assume that \(f:\mathbf{R}^n \to \mathbf{R}\) is convex and differentiable. Then \(\mathbf{x}^*\) is a global minimizer of \(f\), if and only if \(\nabla f(\mathbf{x}^{*}) = 0\).

Hessians of convex functions.

Proposition 5. (Second derivative test) Let \(f:X\subseteq\mathbf{R}^n \to \mathbf{R}\) be a \(C^2\) function and suppose that \(\mathbf{a}\in X\) is a critical point of \(f\). If the hessian \(\nabla^2 f(\mathbf{a})\) is positive definite, then \(f\) has a local minimum at \(\mathbf{a}\).

Proof.

Let \(q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}\) be a quadratic form. We have:

\[\begin{align*} q(\lambda \mathbf{h}) &= (\lambda \mathbf{x}^T) A (\lambda \mathbf{x})\\ &= \lambda^2 \mathbf{x}^T A \mathbf{x}\\ &= \lambda^2 q(\mathbf{x}) \tag{4} \end{align*}\]

We show that if \(A\) is the symmetric matrix associated with a positive definite quadratic form \(q(\mathbf{x})\), then there exists \(M > 0\) such that:

\[\begin{align*} q(\mathbf{h}) \geq M ||\mathbf{h}||^2 \tag{5} \end{align*}\]

for all \(\mathbf{h} \in \mathbf{R}^n\).

First note that when \(\mathbf{h} = \mathbf{0}\), then \(q(\mathbf{h})=q(\mathbf{0})=0\), so the conclusion holds trivially in this case.

Next, suppose that when \(\mathbf{h}\) is a unit vector, that is \(||\mathbf{h}||=1\). The set of all unit vectors in \(\mathbf{R}^n\) is an \((n-1)\)-dimensional hypersphere, which is a compact set. By the extreme-value theorem, the restriction of \(q\) to \(S\) must achieve a global minimum value \(M\) somewhere on \(S\). Thus, \(q(\mathbf{h}) \geq M\) for all \(\mathbf{h} \in S\).

Finally, let \(\mathbf{h}\) be any nonzero vector in \(\mathbf{R}^n\). Then, its normalization \(\mathbf{h}/||\mathbf{h}||\) is a unit vector and also lies in \(S\). Therefore, by the result of step 1, we have:

\[\begin{align*} q(\mathbf{h}) &= q\left(||\mathbf{h}|| \cdot \frac{\mathbf{h}}{||\mathbf{h}||} \right)\\ &= ||\mathbf{h}||^2 q\left(\frac{\mathbf{h}}{||\mathbf{h}||}\right)\\ &\geq M ||\mathbf{h}||^2 \end{align*}\]

We can now prove the theorem.

By the second order Taylor’s formula, we have that, for the critical point \(\mathbf{a}\) of \(f\),

\[\begin{align*} f(\mathbf{x}) = f(\mathbf{a}) + \nabla f(\mathbf{x})\cdot(\mathbf{x} - \mathbf{a}) + \frac{1}{2}(\mathbf{x} - \mathbf{a})^T \nabla^2 f(\mathbf{a}) (\mathbf{x} - \mathbf{a}) + R_2(\mathbf{x},\mathbf{a}) \tag{6} \end{align*}\]

where \(R_2(\mathbf{x},\mathbf{a})/||\mathbf{x}-\mathbf{a}||^2 \to 0\) as \(\mathbf{x} \to \mathbf{a}\).

If \(\nabla^2 f(\mathbf{a}) \succ 0\), then

\[\begin{align*} \frac{1}{2}(\mathbf{x} - \mathbf{a})^T \nabla^2 f(\mathbf{a}) (\mathbf{x} - \mathbf{a}) \geq M||\mathbf{x} - \mathbf{a}||^2 \tag{7} \end{align*}\]

Pick \(\epsilon = M\). By the definition of limits, since \(R_2(\mathbf{x},\mathbf{a})/||\mathbf{x} - \mathbf{a}||^2 \to 0\) as \(\mathbf{x} \to \mathbf{a}\), there exists \(\delta > 0\), such that for all \(||\mathbf{x} - \mathbf{a}||<\delta\), \(|R_2(\mathbf{x},\mathbf{a})|/||\mathbf{x} - \mathbf{a}||^2 < M\). Or equivalently,

\[\begin{align*} |R_2(\mathbf{x},\mathbf{a})| < M||\mathbf{x}-\mathbf{a}||^2 \end{align*}\]

that is:

\[\begin{align*} -M||\mathbf{x}-\mathbf{a}||^2 < R_2(\mathbf{x},\mathbf{a}) < M||\mathbf{x}-\mathbf{a}||^2 \tag{8} \end{align*}\]

Putting together (6), (7) and (8),

\[\begin{align*} f(\mathbf{x}) - f(\mathbf{a}) > 0 \end{align*}\]

so that \(f\) has a minimum at \(\mathbf{a}\).

Proposition 6. A twice differentiable function \(f:\mathbf{R}^n \to \mathbf{R}\) is convex, if and only if, the hessian \(\nabla^2 f(\mathbf{x})\) is positive semi-definite for all \(\mathbf{x}\in\mathbf{R}^n\).

Proof.

Assume first that \(f\) is convex and let \(\mathbf{x}\in\mathbf{R}^n\). Define the \(g:\mathbf{R}^n \to \mathbf{R}\) as a function of the vector \(\mathbf{y}\) setting:

\[\begin{align*} g(\mathbf{y}) := f(\mathbf{y}) - \nabla f(\mathbf{x})^T (\mathbf{y} - \mathbf{x}) \end{align*}\]

Consider the mapping \(T(\mathbf{y}) = -\nabla f(\mathbf{x})^T (\mathbf{y} - \mathbf{x})\). We have:

\[\begin{align*} T(\lambda \mathbf{y}_1 + (1-\lambda)\mathbf{y}_2) &= -\nabla f(\mathbf{x})^T (\lambda \mathbf{y}_1 + (1-\lambda)\mathbf{y}_2 - \mathbf{x}) \\ &= \lambda [-\nabla f(\mathbf{x})^T (\mathbf{y}_1 - \mathbf{x})] + (1-\lambda)[-\nabla f(\mathbf{x})^T (\mathbf{y}_2 - \mathbf{x})]\\ &=\lambda T(\mathbf{y}_1) + (1-\lambda)T(\mathbf{y}_2) \end{align*}\]

Thus, \(T\) is an affine transformation.

Since an affine transformation is convex and \(f\) is convex, their sum \(g\) is also convex. Moreover \(g\) is a function of \(\mathbf{y}\), treating \(\mathbf{x}\) as a constant, we have:

\[\begin{align*} \nabla g(\mathbf{y}) = \nabla f(\mathbf{y}) - \nabla f(\mathbf{x}) \end{align*}\]

and

\[\begin{align*} \nabla^2 g(\mathbf{y}) = \nabla^2 f(\mathbf{y}) \end{align*}\]

for all \(\mathbf{y} \in \mathbf{R}^n\). In particular, \(\nabla g(\mathbf{x}) = 0\). Thus, corollary (4) implies that \(\mathbf{x}\) is a global minimizer of \(g\). Now, the second order necessary condition for a minimizer implies that \(\nabla^2 g(\mathbf{x})\) is positive semi-definite. Since \(\nabla^2 g(\mathbf{x}) = \nabla^2 f(\mathbf{x})\), this proves that the Hessian of \(f\) is positive semi-definite for all \(\mathbf{x} \in \mathbf{R}^n\).

Thus, a function \(f\) is convex, if its Hessian is everywhere positive semi-definite. This allows us to test whether a given function is convex.

Tests for positive definiteness

One of the most important theorems of finite dimensional vector spaces is the spectral theorem. Every real symmetric matrix \(A\) is orthogonally diagonalizable. It admits \(A = Q\Lambda Q^T\) factorization, where \(Q\) is an orthogonal matrix and \(\Lambda = diag(\lambda_1,\ldots,\lambda_n)\).

From basic algebra, we know that, if \(A\) is a non-singular matrix, with all it’s pivot elements \(a_{kk}^{(k)}\) non-zero in the Gaussian elimination process, then \(A=LDU\) where \(L\) and \(U\) are lower and upper unitriangular matrices and \(D\) is a diagonal matrix consisting of the pivots of \(A\). If \(A\) is symmetric, then it admits the unique factorization \(A = LDL^T\).

Consider the quadratic form \(q(\mathbf{x}) = \mathbf{x}^T A \mathbf{x}\). Substituting \(A = Q \Lambda Q^T\), we have:

\[\begin{align*} q(\mathbf{x}) &= \mathbf{x}^T A \mathbf{x} \\ &= \mathbf{x}^T Q \Lambda Q^T \mathbf{x} \\ &= (Q^T \mathbf{x})^T \Lambda (Q^T \mathbf{x}) \tag{9} \end{align*}\]

But, the matrix \(Q = [\mathbf{q}_1,\mathbf{q}_2,\ldots,\mathbf{q}_n]\). Moreover, \(A=Q\Lambda Q^T\) implies that \(AQ^{-1} = AQ^T = \Lambda Q^T\). Therefore:

\[\begin{align*} A\begin{bmatrix} \mathbf{q}_1\\ \mathbf{q}_2\\ \vdots\\ \mathbf{q}_n \end{bmatrix}=\begin{bmatrix} \lambda_1 \\ & \lambda_2 \\ & & \ddots \\ & & & \lambda_n \end{bmatrix}\begin{bmatrix} \mathbf{q}_1\\ \mathbf{q}_2\\ \vdots\\ \mathbf{q}_n \end{bmatrix} \end{align*}\]

So, \(\mathbf{q}_1,\ldots,\mathbf{q}_n\) are the eigenvectors of \(A\). Now,

\[\begin{align*} \mathbf{q}_1 &= [q_{11}, q_{21}, \ldots,q_{n1}]^T = q_{11} \mathbf{e}_1 + q_{21} \mathbf{e}_2 + \ldots + q_{n1} \mathbf{e}_n\\ \mathbf{q}_2 &= [q_{12}, q_{22}, \ldots,q_{n2}]^T = q_{12} \mathbf{e}_1 + q_{22} \mathbf{e}_2 + \ldots + q_{n2} \mathbf{e}_n\\ \vdots \\ \mathbf{q}_n &= [q_{1n}, q_{2n}, \ldots,q_{nn}]^T = q_{1n} \mathbf{e}_1 + q_{2n} \mathbf{e}_2 + \ldots + q_{nn} \mathbf{e}_n \end{align*}\]

So,

\[\begin{align*} Q = \begin{bmatrix} q_{11} & \ldots & q_{1n}\\ \vdots & & \vdots \\ q_{n1} & \ldots & q_{nn} \end{bmatrix} \end{align*}\]

is the change of basis matrix from the standard basis \(\mathcal{B}_{old}=\{\mathbf{e}_1,\ldots,\mathbf{e}_n\}\) to the eigenvector basis \(\mathcal{B}_{new}=\{\mathbf{q}_1,\ldots,\mathbf{q}_n\}\).

If \(\mathbf{x}\) are the coordinates of a vector in the standard basis and \(\mathbf{y}\) are its coordinates in the eigenvector basis, then \(\mathbf{x}=Q\mathbf{y}\).

Hence, substituting \(\mathbf{y}=Q^{-1}\mathbf{x}=Q^T \mathbf{x}\) in equation (9), the quadratic form becomes:

\[\begin{align*} q(\mathbf{x}) &= \mathbf{x}^T A \mathbf{x} = \mathbf{y}^T \Lambda \mathbf{y}\\ &=\lambda_1 y_1^2 + \lambda_2 y_2^2 + \ldots + \lambda_n y_n^2 \end{align*}\]

where we have changed the axes to be aligned across the eigenvectors of \(A\).

The coefficients \(\lambda_i\) are the diagonal entries of \(\Lambda\) and are the pivots of \(A\). The quadratic form is strictly positive for all \(\mathbf{y}\), if and only if the eigenvalues \(\lambda_1 > 0\), \(\lambda_2 >0\), \(\ldots\), \(\lambda_n > 0\).

Theorem 7. (Positive definiteness) Let \(A \in \mathbf{R}^{n \times n}\) be a real symmetric positive definite(SPD) matrix. Then, the following statements are equivalent:

\(A\) is non-singular and has positive pivot elements when performing Gaussian elimination (without row exchanges).
\(A\) admits a factorization \(A = Q \Lambda Q^T\), where \(\Lambda = diag(\lambda_1,\ldots,\lambda_n)\) such that \(\lambda_i > 0\) for all \(i=1,2,3,\ldots,n\).

Cholesky Factorization

We can push the result above slightly further in the positive definite case. Since, each eigen value \(\lambda_i\) is positive, the quadratic form can be written as a sum of squares:

\[\begin{align*} \lambda_1 y_1^2 + \lambda_2 y_2^2 + \ldots + \lambda_n y_n^2 &= (\sqrt{\lambda_1} y_1)^2 + \ldots + (\sqrt{\lambda_n} y_n)^2\\ &= z_1^2 + z^2 + \ldots + z_n^2 \end{align*}\]

where \(z_i =\sqrt{\lambda_i}y_i\). In the matrix form, we are writing:

\[\begin{align*} \hat{q}(\mathbf{y}) &= \mathbf{y}^T \Lambda \mathbf{y}\\ &= \mathbf{z}^T \mathbf{z}\\ &= ||\mathbf{z}||^2 \end{align*}\]

where \(\mathbf{z} = S\mathbf{y}\) with \(S=diag(\sqrt{\lambda_1},\ldots,\sqrt{\lambda_n})\). Since \(\Lambda = S^2=SS^T\), \(S\) can be thought as the square root of the original matrix \(\Lambda\). Substituting back into the equation \(A=Q\Lambda Q^T\), we deduce the Cholesky factorization:

\[\begin{align*} A &= Q\Lambda Q^T\\ &= QS S^T Q^T\\ &= MM^T \end{align*}\]

Level plots of a positive definite quadratic form are ellipsoids

Consider the level plot of a positive definite quadratic form \(q(\mathbf{x})\):

\[\begin{align*} q(\mathbf{x}) = \hat{q}(\mathbf{y}) &= 1 \\ \lambda_1 y_1^2 + \ldots + \lambda_n y_n^2 &= 1\\ \frac{y_1^2}{\left(\frac{1}{\sqrt{\lambda_1}}\right)^2}+\frac{y_2^2}{\left(\frac{1}{\sqrt{\lambda_2}}\right)^2} + \ldots + \frac{y_n^2}{\left(\frac{1}{\sqrt{\lambda_n}}\right)^2} &= 1 \end{align*}\]

Thus, the level plot of a positive definite quadratic form is an ellipse (if \(n=2\)) or an ellipsoid (if \(n > 2\)) with axes aligned along the eigenvectors and lengths \(\frac{1}{\sqrt{\lambda_i}}\), \(i=1,2,3,\ldots,n\).

import numpy as np
import matplotlib.pyplot as plt

A = np.array([[4, 3], [3, 4]])

eigenvalues, eigenvectors = np.linalg.eig(A)

# The parameteric equation of an ellipse is
# (x(theta),y(theta))=(a cos theta, b sin theta)
# where a and b are semi-major and semi-minor axes
theta = np.linspace(0, 2 * np.pi, 10000)
y1 = np.sqrt(1 / eigenvalues[0]) * np.cos(theta)
y2 = np.sqrt(1 / eigenvalues[1]) * np.sin(theta)

Y = np.array([y1,y2])

# The change of basis matrix from the standard basis to the eigen vector basis
# is Q. So, x = Q y, where Q = [q_1,q_2]; q_1, q_2 are the eigenvectors of A.

Q = eigenvectors.T
X = np.dot(Q, Y)
x1 = X[0,:]
x2 = X[1,:]

plt.xlim([-1, 1])
plt.grid(True)
plt.title(r'$q(\mathbf{x})=\mathbf{x}^T A \mathbf{x} = 1$')
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.plot(x1, x2)
plt.show()