In the previous article we learned about spaces and how to position and orient objects in world space by applying transformation matrices on them. We also learned about camera space, that is simply another coordinate system within the world space.

Recall that during rendering, a vertex first gets mapped from world space into camera space and then projected onto the 2D view plane using a projection matrix (roughly speaking). So in this post, we deal with the question how to set up the camera space by positioning and orienting the camera and how to derive a matrix from it so that we can map a vertex from world space into camera space. In OpenGL, this matrix (called the “view matrix”) plays a big role and requires to be specified in every OpenGL program.

## Camera Space

The coordinate system of the camera, as discussed in the previous section, is spaned by three orthonormal vectors \( \vec{u}, \vec{v},\vec{w}\in \mathbb{R}^3\). The position of the camera is defined by its focal point (or eye) named \(\vec{c}\) in world space. Note that \(\vec{c}\) is also the origin of the camera coordinate system. By convention the camera looks into a direction \( -\vec{w} \), which is most often calculated by subtracting a “look-at” point \(\vec{l}\) from the focal point

\[ \vec{w} = | \vec{c} – \vec{l}| \]

Now we compute \(\vec{u}\) and \(\vec{v}\) with help of an “up-vector”, (0, 0, 1), that basically points upwards. With the help of the cross-product we can now compute the vectors

\[ \begin{aligned} \vec{u} & = | (0,0,1)^{\top} \times \vec{w} | \\ \vec{v} & =| \vec{w} \times \vec{u}| \end{aligned} \]

that spans our new orthonormal camera coordinate system.

## Camera Transformation

Now that we set up the camera space, we need to construct a matrix that maps from world space into camera space. More concretely, to map a given vertex \(\vec{a}\) from world space to camera space, we apply the following two steps:

- translate \(\vec{a}\) with respect to the camera position, and then
- map the translated point into the coordinate system \(\vec{u},\vec{v},\vec{w}\).

These two steps will later be combined into one matrix which are then together called **camera transformation**.

The translation part is fairly easy. With our given camera position \(\vec{c}\), we use a translation matrix \(T\) to move \(\vec{a}\) relative to the camera position

\[ T(-\vec{c})\vec{a} = \begin{pmatrix} 1 & 0 & 0 & -c_x \\ 0 & 1 &0& -c_y \\ 0 & 0 &1& -c_z\\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix}a_x\\a_y\\a_z\\1\end{pmatrix}=\begin{pmatrix}a_x-c_x\\a_y-c_y\\a_z-c_z\\1\end{pmatrix}\]

Ok. Now for the mapping part, we have two options how to proceed: Either we try to set up a rotation matrix that rotates the vertex into place in camera space. This would require to determine the angles between the axis coordinates so that we can use them to rotate the dimensions of the point. Or, we make use of a wonderful trick that is applicable when dealing with orthonormal coordinate systems.

We are going to do the latter.

Let us quickly review definition of the dot product, which says that for two given vectors \(\vec{a}\), \(\vec{b}\), where \(\vec{b}=\|\vec{b}\|=1\) the dot product yields

\[ \vec{a} \cdot \vec{b} = \| \vec{a}\| cos(\theta)\]

and \(\theta\) is the angle between the two vectors. The dot product basically computes the scaling factor of both \(\vec{a}\) and \(\vec{b}\) to the point where \(\vec{a}\) is orthogonally projected on \(\vec{b}\) (and vice versa). This may be a little confusing which is why I tried to elaborate on the dot product a bit deeper in the math appendix.

Now, \(\vec{u}\), \(\vec{v}\) and \(\vec{w}\) are orthogonal to each other, meaning that

\[ \vec{u} \cdot \vec{v} = \vec{v} \cdot \vec{w} = \vec{w} \cdot \vec{u} = 0\]

holds true, so that they span a so-called **orthonormal basis** in world space. This allows to set up a mapping matrix that “rotates” a point from world space into camera space by simply computing the dot product between the point and each coordinate axis vector

\[ R(\vec{u},\vec{v},\vec{w})\vec{a} = \begin{pmatrix} u_x & u_y & u_z & 0 \\ v_x & v_y & v_z & 0 \\ w_x & w_y & w_z & 0 \\ 0 & 0 & 0 & 1 \end{pmatrix} \begin{pmatrix}a_x\\a_y\\a_z\\1\end{pmatrix}=\begin{pmatrix}\vec{u} \cdot \vec{a} \\ \vec{v} \cdot \vec{a} \\ \vec{w} \cdot \vec{a}\\1 \end{pmatrix}\]

Now that we have both transformations ready, we are able to multiply them into one matrix \(V = R(\vec{u},\vec{v},\vec{w})T(-\vec{c}) \). This is called the **view matrix**.

\[ V = \begin{pmatrix} u_x & u_y & u_z & -c_{x}u_x-c_{y}u_y-c_{z}u_z \\ v_x & v_y & v_z & -c_{x}v_x-c_{y}v_y-c_{z}v_z \\ w_x & w_y & w_z & -c_{x}w_x-c_{y}w_y-c_{z}w_z \\ 0 & 0 & 0 & 1 \end{pmatrix}\]

The result of applying the view matrix \(V\) on \(\vec{a} \) in world space is a new set of coordinates in camera space. It is important to understand that the coordinates are just scaling coefficients for the axes \(\vec{u},\vec{v},\vec{w}\) which itself lay in world space. The linear combination \(a_x\vec{u}+ a_y+\vec{v}+a_z\vec{w}\) describes the position of the point in camera space. But the point has not been “moved”, its position is just now been described relative to the origin and orientation of the camera space.

## Code Example

Let us apply the above example in code. I make use of my own vector and matrix implementation which are very similar to those other countless implementations that can be found on the web.

Note that I start using heterogenious coordinates right from the beginning.

```
// the position of the camera, called 'eye'
Vector3f c = new Vector3f(5, -5, 8);
Vector3f u = new Vector3f();
Vector3f v = new Vector3f();
Vector3f w = new Vector3f();
// compute "negative" look direction by substracting
// c from the look-at point (3,4,0)
w.subAndAssign(c, new Vector3f(3, 4, 0)); // w = c - (3,4,0)
w.normalize();
// compute cross product
u.crossAndAssign(new Vector3f(0, 0, 1), w); // side = (0,0,1) x w
u.normalize();
v.crossAndAssign(w, u); // up = side x look
v.normalize();
Matrix4f rotation = new Matrix4f(); // identity
rotation.setIdentity();
// note the format: set(COLUMN, ROW, value)
// it may be different for your matrix implementation
rotation.set(0, 0, u.x);
rotation.set(1, 0, u.y);
rotation.set(2, 0, u.z);
rotation.set(0, 1, v.x);
rotation.set(1, 1, v.y);
rotation.set(2, 1, v.z);
rotation.set(0, 2, w.x);
rotation.set(1, 2, w.y);
rotation.set(2, 2, w.z);
Matrix4f translation = new Matrix4f(); // identity
translation.set(3, 0, -c.x);
translation.set(3, 1, -c.y);
translation.set(3, 2, -c.z);
// view matrix
Matrix4f view = new Matrix4f();
view.multAndAssign(rotation, translation); // view = rotation * translation
// print matrix on console
view.print();
```

At the end of the code snipped we print out the matrix on the console. This is what it says.

0.9761871 0.21693046 0.0 -3.7962832 -0.1421731 0.6397789 0.75529456 -2.1325965 0.16384639 -0.73730874 0.65538555 -9.748859 0.0 0.0 0.0 1.0

If you are building your own Matrix class, I recommend to incorporate a `lookAt` method that sets up the matrix. Also, I recommend creating a camera object that caches the \(\vec{u}\), \(\vec{v}\) and \(\vec{w}\) vectors and provides general heper methods to deal with positioning, pitch, yaw, roll and other stuff that you may need (eg. a matrix stack).

## 2 Comments

## Zirian · April 23, 2014 at 10:07 pm

could you please specify more briefly what are u,v, & w vectors, and regarding the up vector shouldn’t this be (0,1,0) ???

## Khalid · September 1, 2014 at 2:04 pm

Hi,

Thank you for the tutorial.

Suppose that the lookAt vector is 0,0,1 (World Z Axis) and the camera up vector is 0,1,0 (World Y Axis ), in order to calculate the camera coodrinates we do the following:

1- Camera Z axis: is the lookAt vector

2- Camera X axis: is the cross product of the lookAt and the Camera up vector

3- Camera Y axis: is the Cross product of Camera X and Camera Z

Now assume that the camera up vector is tilted a little in the YZ plane so it is 0,1,1, now the cross product of the lookAt and the up vector will give a vector in the same direction as the case when the camera up vector was 0,1,0 so the camera Y axis will have the same direction as the camera up vector 0,1,0 case. Then the rotation matrix that transfers from the wrold coordinates to camera coordinates will have the same values. whic will give a similar view as if the camera is not tilted.

If this is true then there is something wrong, because when the camera is tilted the view should change. What is wrong in my understanding?