In the previous article, we discussed the camera transformation which maps a vertex from world space into the camera space. Recall that the camera spans a orthonormal coordinate system with the three vectors $$\vec{u}$$, $$\vec{v}$$ and $$\vec{w}$$, where $$-\vec{w}$$ points along the viewing direction.

In this section we will deal with the projection of the 3D vertex in camera space into a 2D view plane. In OpenGL, what follows is clipping and mapping to so-called normalized device coordinates which are tightly coupled into the construction of projection matrix. In fact, the pure mathematical construction of the projection matrix is easy. What makes it difficult is the clipping part.

As you maybe remember from school, there two important kind of projections:

1. orthogonal projection, and
2. perspective projection.

We will cover both successively.

## Orthographic Projection Matrix

Orthogonal projection itself is pretty straight forward; you simply dismiss the $$w$$-coordinate for the projected 2D point. By that, you discard the depth information. So a pure orthogonal projection matrix looks like

$P_{ortho} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix}$

Note that it looks almost like the identity matrix, except for the missing 1 in the 3rd dimension which is replaced by 0. The 0 causes the 3rd dimension to be omitted when applied on this matrix.

However, OpenGL needs to clip triangles that lay outside of the viewing frustum. So it requires some means to quickly determine, whether a given vertex lays outside the clipping volume or inside. The general idea is, to scale the viewing frustum to a box such that the faces of the cube range from -1 to +1 for all three coordinate axes. More concretely, it is scaled to the unit cube which is defined by having a minimum corner at (-1,-1,-1) and a maximum corner at (1,1,1). The faces are called clipping planes and are defined as $$left, right, top, bottom, far$$ and $$near$$. This new coordinate system is referreed to as clip space. The clip space (ie. unit cube) is centered around the line of sight (negative z axis) of the camera space.

Once a point is in clip space, OpenGL checks each coordinate against whether it falls in the range between -1 and +1. If not, the point is discarded and the point is marked as laying outside clip space.

Once clipping has been performed, OpenGL takes the last dimension of the point in clip space and divides all other dimensions by it. This is a valid operation in homogenious coordindates. Recall that in homogenious coordinates a $$(x, y, z, \omega)$$vector gets an additional dimension, usually set to 1. This dimension is $$\omega$$. But since $$\omega$$ is usually 1, it does not affect the vector. This basically puts the point into normalized device coordinates (NDC) from where it is further processed towards the fragment shader.

Clipping and NDC conversion is internal in OpenGL and kinda trivial. What’s not so trivial is the projection part in combination with the mapping to clip space.

To translate the clip space to center around the origin, we subtract $$\frac{left+right}{2}$$ (the mid-point between $$right$$ and $$left$$ ) from the point. And then scale it to fit in the range -1 to +1 with respect to $$right-left$$ such that we scale the point by $$\frac{1-(-1)}{right-left}$$. We do the same for the y and z coordinate. We now express the translate $$T$$ and scaling $$S$$ operation in one matrix multiplication.

$$ST = \begin{pmatrix} \frac{2}{right-left} & 0 & 0 & 0 \\ 0 & \frac{2}{top-bottom} & 0 & 0 \\ 0 & 0 & \frac{2}{far-near} & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} 1 & 0 & 0 & – \frac{left+right}{2} \\ 0 & 1 & 0 & – \frac{top+bottom}{2} \\ 0 & 0 & 1 & – \frac{far+near}{2} \\ 0 & 0 & 0 & 1 \end{pmatrix}$$

which yields

$$P = \begin{pmatrix} \frac{2}{right-left} & 0 & 0 & – \frac{right+left}{right-left} \\ 0 & \frac{2}{top-bottom} & 0 & – \frac{top+bottom}{top-bottom} \\ 0 & 0 & \frac{2}{far-near} & – \frac{far+near}{far-near} \\ 0 & 0 & 0 & 1 \end{pmatrix}$$

If we would apply this matrix on a vector $$\vec{a}=(a_x,a_y,a_z,1)^{\top}$$, we would get

$$P\vec{a}= \begin{pmatrix} \frac{2a_x}{right-left} – \frac{right+left}{right-left} \\ \frac{2a_y}{top-bottom}-\frac{top+bottom}{top-bottom} \\ \frac{2a_z}{far-near} – \frac{far+near}{far-near} \\ 1 \end{pmatrix}$$

A quick check with $$a_x = left, a_y = top, a_z=near$$ assuming the boundaries of the unit cube, assures us that $$P$$ scales and translates into clip space correctly. Computing $$P\vec{a}$$ becomes

$$\begin{pmatrix} \frac{2left-right-left}{right-left} \\ \frac{2top-top-bottom}{top-bottom} \\ \frac{2near-far-near}{far-near} \\ 1 \end{pmatrix}= \begin{pmatrix} \frac{left-right}{-1(left-right)} \\ \frac{top-bottom}{top-bottom} \\ \frac{near-far}{-1(near-far)} \\ 1 \end{pmatrix} = \begin{pmatrix} -1 \\ 1 \\ -1 \\ 1\end{pmatrix}$$

which maps the frustum boundaries to the correct clip space boundaries.

It is important to understand the transformation of a point into clip space for the orthographic case, because we are going to use the same principle for the perspective projection case.

## Perspective Projection Matrix

The way to simulate perspective viewing is making use of so called forth-shortening. It basically means that all points converge closer to the focal point the further away the point is from the camera. So let $$x_p$$ and $$y_p$$ be the 2D coordinates on the projection plane located at $$z=near$$. With similar triangles,

$\frac{x_p}{near}=\frac{x}{z} \hspace{2 mm} \rightarrow \hspace{2 mm} x_p=x\frac{near}{z}$

and

$\frac{y_p}{near}=\frac{y}{z} \hspace{2 mm} \rightarrow \hspace{2 mm} y_p=y\frac{near}{z}$

So much for simulating forth-shortening. But we still need to deal with clipping. In the perspective case, the unit cube is not a simple cube, but actually needs to take the forth-shortening into account. So we need to use the above forth-shortening equations in our clip space mapping.

We already know how to scale and translate a point into clip space, so filling the perspective projection formulas into the clip space mapping formulas from the above matrix $$P\vec{a}$$ we get the following equations

\begin{aligned} a_x = &\frac{2 x \frac{near}{z}}{right-left} -\frac{right+left}{right-left} \\ a_y = &\frac{2 y \frac{near}{z}}{top-bottom} – \frac{top+bottom}{top-bottom} \end{aligned}

The problem is that $$a_x$$ and $$a_y$$ depend on $$z$$, which keeps us from setting up a simple matrix-vector multiplication as we did in the orthographic case. Instead, we are going to apply an extremely awesome trick: We are going to make all factors scaled by $$z$$ and wait until the perspective divide to later divide it out of it again automatically by OpenGL. This is what perspective divide and normalized device coordinates are about, recall that for homegenious coordinates this holds true

$\begin{pmatrix}a_x \\ a_y \\ a_z \\ a_{\omega} \end{pmatrix} = \begin{pmatrix}a_x /a_{\omega} \\ a_y/a_{\omega} \\ a_z/a_{\omega} \\ 1 \end{pmatrix}$

That means we need to restructure all projection mappings such that they can be divided them by $$z$$ later and we need to last dimension $$\omega$$ in the vertex to assume $$-z$$.

We first deal with the restructuring for $$a_x$$ to become dividable by $$-z$$

\begin{aligned} a_x &= \frac{2 x \frac{near}{z}}{right-left} -\frac{right+left}{right-left} \\ & = \frac{2x \times near}{-z(right-left)} -\frac{-z(right+left)}{-z(right-left)} \\ &= x \frac{\frac{2near}{right-left}}{-z} + \frac{z\frac{right+left}{right-left}}{-z}\end{aligned}

Similarly, we restructure $$a_y$$ in the same manner

\begin{aligned} a_y = y \frac{\frac{2near}{top-bottom}}{-z} +\frac{z\frac{top+bottom}{top-bottom}}{-z}\end{aligned}

To see where we are going, let us take a first look at the final projection matrix. You nicely see how $$/-z$$ is missing and replaced with -1 in the z coordinate so that after the matrix-vector multiplication $$a_{\omega}$$ is set to $$-z$$.

$\begin{pmatrix} \frac{2near}{right-left} & 0 & \frac{right+left}{right-left} & 0\\ 0 & \frac{2near}{top-bottom} & \frac{top+bottom}{top-bottom} & 0 \\ 0 & 0 & d & q \\ 0 & 0 & -1 & 0 \end{pmatrix} \begin{pmatrix} x \\ y \\ z \\ 1\end{pmatrix}$

We have not said anything yet about the $$z$$ coordinate, depicted $$d$$ and $$q$$ in the matrix above. Deriving the mapping equation from the matrix and considering the later perspective divide we get

$a_z = \frac{d \times z+q}{-z}$

Similar to what we did in the orthographic case, we fill in the clip space boundaries so that we get two equations with two unknown variables.

\begin{aligned} \frac{-d \times near+q}{near} & = -1 & \rightarrow & d = \frac{q}{near}+1 \\ \frac{-d\times far+q}{near} &= 1 & \rightarrow & q = far(1+d) \end{aligned}

By solving for $$d$$ and $$q$$ (which you can do easily by hand), we get

\begin{aligned} d = & -\frac{far+near}{far-near} \\ q = & -\frac{2 \times far \times near}{far-near}\end{aligned}

so that we are finally done and can complete the projection matrix

$\begin{pmatrix} \frac{2near}{right-left} & 0 & \frac{right+left}{right-left} & 0\\ 0 & \frac{2near}{top-bottom} & \frac{top+bottom}{top-bottom} & 0 \\ 0 & 0 & -\frac{far+near}{far-near} & -\frac{2 \times far \times near}{far-near} \\ 0 & 0 & -1 & 0 \end{pmatrix}$

We are done. Yay! Now you understand the projection matrix and how to set it up.

OpenGL obviously does much more than just clipping and mapping to normalized device coordinates, but this post is just about the projection matrix and in my humble opinion, once you understand the viewing and projection matrix, the rest will not be difficult and, honestly, is not that import in my opinion.

## Code Examples

/**
* Constructs a projection matrix.
* @param left   the left-hand side clipping plane in camera space
* @param right  the right-hand side clipping plane in camera space
* @param bottom the lower clipping plane in camera space
* @param top    the upper clipping plane in camera space
* @param near   the closer clipping plane in camera space
* @param far    the clipping plane further away in camera space
*/
public Matrix4f perspectiveFrustum(
float left, float right,
float bottom, float top,
float near, float far)
{
Matrix4f projection = new Matrix4f();

// note the signature: set(COLUMN, ROW, value)
// it may be different in the matrix implementation that you use

projection.set(0, 0, (2f*near)/(right-left) );
projection.set(2, 0, (right+left)/(right-left) );

projection.set(1, 1, (2*near)/(top-bottom) );
projection.set(2, 1, (top+bottom)/(top-bottom) );

projection.set(2, 2, -(far+near)/(far-near) );
projection.set(3, 2, -2*(far*near)/(far-near) );

projection.set(2, 3, -1);
projection.set(3, 3, 0);

return projection;
}


It is usually more intuitive to set up the projection matrix with a field-of-view angle.

/**
* Constructs a projection matrix out of a field-of-view angle
* @param viewAngle field-of-view angle in degrees
* @param width the width of the screen in camera space
* @param height the height of the screen in camera space
* @param nearClippingPlaneDistance the near clipping plane (projection plane)
* @param farClippingPlaneDistance the far clipping plane
*/
public Matrix4f projection(
float viewAngle,
float width, float height,
float nearClippingPlaneDistance, float farClippingPlaneDistance)
{
// convert angle from degree to radians
final float radians = (float) (viewAngle*Math.PI / 180f);

float halfScaledAspectRatio = halfHeight*(width/height);

Matrix4f projection = frustum(-halfScaledAspectRatio, halfScaledAspectRatio, -halfHeight, halfHeight, nearClippingPlaneDistance, farClippingPlaneDistance);

return projection;
} 