We shall begin.
Let me welcome you, fellow nerd! You are about to embark on a wonderful journey full of geekiness and technical awesomeness. Computer graphics, especially 3D graphics is the pinnacle of nerdcraft. It has been fascinating gamers, artists and developers since the first photon has been shot on a screen. Together with game development, 3D computer graphics belongs to the most challenging parts of computer science, as it involves a lot of math, a deep understanding of underlying hardware, meticulous and efficient programming and a sense for beauty.
This (lengthy) tutorial and its following articlesa re aimed for game developers, which means programmers, not artists. In particular, it is aimed at modern OpenGL. OpenGL 3.x (and 4.x*) becomes increasingly popular due to the rais of mobile devices with native OpenGL support, WebGL and popular indie games such as Minecraft that has for instance been developed in Java and OpenGL.
Screenshot of Doom, developed by ID Software in 1993. Doom was one of the first extremely popular 3D games. 3D graphics has much evolved since than, as did the graphics hardware. However, the same main principles still apply until today.
Most current OpenGL tutorials such as NeHe, etc. cover old OpenGL versions. While these tutorials are excellently explaining 3D graphics, OpenGL became much more restrictive in newer versions and with the introduction of shaders, OpenGL has very little left of the initial command set (not to mention the mandatory use of low-level data structures), so that I think that yet another OpenGL tutorial makes sense.
The graphics that you see on your screen when playing a video game or watching a rendered movie such as Shrek, is obviously artificial. The images are neither a photography of some real-world scene, nor a painting that somebody drew by hand. Rather, these images that (often) try to be as realistic as possible, or at least visually appealing as possible, have obviuosly been generated by some Hocus Pocus in the form of an algorithm that takes some wacky input and outputs a beautiful 2D image.
This is pretty much what we are trying to accomplish on a veeeery basic level: projecting a mathematical description of a 3D scene onto a 2D image (screen).
Let us split the Hocus Pocus intro three parts:
- The Input - Geometry and Meshes
- The Algorithm - Raytracing and Rasterization
This will be the structure of the remainder of this article.
The Input - Geometry and Meshes
The input is always a three dimensional, mathematical description of the world (or a scene), including all objects, their positions, the definition of its appearance, light, shadows and so on. The world is accuratley described through geometry, formulas, materials, light intensities and other physical metrics to fully simulate a scene. All objects have im common, that they ahve a position and orientation in the scene. The form and material may differ on the other hand. The simplest object is probably a sphere which can be described by a mid-point and a radius only. A box, slightly more complex, has a height, width and depth. But how to describe a face or a space ship hull?
In 3D computer graphics, eventually all geometric objects are described as a mesh of triangles. This has several advantages over a mathematical, reduced description of an object. With a triangle mesh,
- you are free to modell anything (car, cat, cup, ...),
- you can employ the same data structures to deal with triangle meshes,
- triangles have nice mathematical properties that make them efficient to deal with in 3D graphics.
A dolphin drawn as low-resolution mesh of triangles. Triangle Meshes allow to describe pretty much every geometric object and are efficient to deal with for current 3D graphics systems. The image is borrowed from Wikipedia.
A triangle consists of three vertices (corner points) and three edges. The position of each vertex defines the the position and orientation of the triangle in the scene. The triangles in a mesh often share some vertices so that changing the position of on vertex will have an effect on the orientation of all attached triangles. This is important to understand, because OpenGL natively uses specialized data structures to deal with triangle meshes.
A polygon is often referred to as a face. A triangle is the simplest polygon, so it is also referred to as face. A face has an orientation in space and can be shaded (drawn). In order to shade a triangle, you need color information. That is why it is quite common to provide shading information along with the position of each vertex. Such shading information may a plain color, but could also be a material, reflection properties, light emittance, etc.
The Algorithm - Raytracing and Rasterization
In order to get an image of the 3D world, you need to position a camera in the scene. You can think this camera to be part of the physics simulation: It captures the rays of light that are reflected from the objects in the scene and focuses them behind a lens leading to a sharp 2D projection of the 3D scene. This image is what is what you see on the screen. A camera requires three vectors to describe its position and orientation in the scene: is the position, the viewing direction and a vector that points upwards so that the image is not vertically flipped.
As a matter of fact, most animation movies such as those from Pixar are rendered via methods of tracing the rays of light in the scene into the camera. This concept is called Raytracing or Ray Casting. Well, actually, raytracing shoots rays of "sigh" from the focal point of the camera through each pixel until it intersects an object on your scene where the material, light, shades, texture, reflections, etc. are taken into account to compute the color of the pixel. The advantage of raytracing is, that it yields very realistic results. For example shadows can easily be computed by checking whether the light source is visible for a given intersection point. If not, the intersection point lays in the shade.
Raytracing shoots a ray through each pixel in the projected image into the scene. The point where it hits an object defines the color of the pixel. This projects the 3D scene onto a 2D image. The pyramid shaped corpus is the space that is seen by the camera (also called "Viewing Frustum"). The frustum causes objects that are further away from the camera to be smaller projected onto the plan than objects that are closer.
The illusion of spatial depth is caused by the effect that objects that are closer to the camera appear larger than those that are further away from the camera. This is important to understand in order to understand how to simulate 3D viewing. In analogy to our camera example, this effect is caused by focussing the rays of sight in the focal point. This is called perspective projection. The further they "travel" from the camera, the more they drift away. On the projection plan however, the objects that are further away cover less pixels than those that are closer so that they appear smaller in size. With several different sized objects in the scene, the image fakes the illusion of seeing into a real 3D scene.
Orthogonal projection projects objects in the 2D projection plane without taking the distance into consideration. In our Raytracing example this is the equivalent to not emitting the rays from the focal point (camera) but rather perpendicular to the projection plane.
To illustrate this point, let us assume for a moment, that we deliberately not focus the rays in the focal point. Instead, we shoot the rays orthogonal to the projection plane. By that we lose the information of depth and we speak of orthogonal projection because we shoot the rays orthogonal to the projection plane. There a some use cases where orthogonal projection makes sense such as blueprints or technical illustrations. Orthogonal projections is also employed in old-school map-based games (eg. "Tycoon games") such as Open Transport Tycoon and sometimes referred to as "2.5D". While these games where actually sprite based (not projection based), they still nicely illustrate the illusion of an orthogonal projection.
Open Transport Tycoon is a map based game that uses orthogonal projection (objects further away have same size like those close to the camera). It is internally actually implemented as a simply sprite based map, where the (partly transparent) sprites are drawn from back to front to get the occlusion right. This method is also referred to as 2.5D and enables a high degree of visual detail because the sprites are just statically drawn to the screen.
However, let us come back to perspective projection with Raytracing. In pseudo-code, the raytracing algorithm looks like follows.
Let's assume we have our camera positioned at the focal point , where is the focal point, the viewing direction as a unit vector and is the "up vector" which is the projection planes vertical orientation in the scene. Two scalars and are used to address pixels in the projection plane, eg. both ranging from to with a step width of .
1) Let be the horizontal orientation of the projection plane. 2) For each pixel we shoot a ray through the projection plane 2.1) Initialize closest intersection 2.2) For each object in the scene, compute 2.2.1) Keep closest intersection for that satisfies 2.3) The color of pixel is then taken from intersection point
However, Raytracing is computationally expensive, because each pixel has to be considered individually which means shooting dozens of thousands "rays" into a scene, testing for collision with potentially many objects and computing the color reflection (casting additional light-rays) off the object that has been hit. Since companies such as Pixar do not require to render their movies in real-time, they can trade better quality for longer computation time. In addition, Raytracing can usually be parallelized.
"Bobby" from Clement Granjon modeled and rendered with Blender. Example of an image rendered with raytracing. Smoke, lighting, depth-of-field and skin illustrate the power of current open-source image generation techniques.
With progressing hardware capabilities, real-time Raytracing may eventually become feasible, most video games still use a method called Rasterization. It is kind of the same principle in the sense that you project stuff on the projection plane. But instead of computing all pixels one-by-one, it projects entire triangles on the projection plane and draws the triangle as a whole. And this is already pretty much what OpenGL does. OpenGL is basically a framework for providing powerful tools for very efficient rasterization. Rasterization provides are large performance benefit over Raytracing for the cost of less level of detail.
So the first job is to to project the triangle vertices onto the projection plane. The algorithm then checks and discards triangles that lay outside of the rectangular screen area. This is referred to as culling. Clipping occurs if the triangle only partly overlaps the screen, where the vertices are still mapped, by the triangle gets only partly drawn. As part of the projection calculation, rasterization keeps track of the distance of the vertices between the camera and each vertex. This depth information is required in a second for the depth buffer, which is a secondary memory space allocated to be as large as the buffer that later holds the pixels color information of the screen.
The next job is to actually draw the triangle. To do so, the algorithm computes each pixel individually that lays between the projected vertices on the projection plane. For each pixel, it also computes the "distance of the pixel" by interpolating the distances between the vertices. It checks the distance of each pixel against the depth buffer to make sure that there is no pixel occluding the one that is currently processed. If that's not the case, the pixel is drawn and the new distance is written to the depth buffer. The final color of a pixel can be computed through numerous ways and is often application dependent. We shall, however, assume for simplicity reasons at the moment that the color is also an interpolation between the color passed along with the vertices. Once the color is computed, it is written to the screen.
A triangle is projected from the 3D scene onto the 2D viewing plane (screen). This is (very) basically how rasterization works. Rasterization projects the three vertices of a triangle onto the screen and shades the pixels that lay between the vertices.
As you can see, there are basically two steps in Rasterization. The first step is mapping the vertices of a triangle onto the projection plane. The second step is drawing the pixels on the projection plane that lay between the projected vertices. This is actually what we will discuss in-depth in later chapters as we discuss Vertex Shaders (projection vertices) and Fragement Shaders (drawing pixels). Both shader types need to be provided (usually as source code) by the programmer and therefore can be heavily customized to the application that makes use of the shaders.
A slightly more structured overview over the algorithm is provided by the following pseudo-code. I deliberately avoid a formal notation in this pseudo-code because it is important that the general principle becomes clear. We will be having enough math in later chapters. Trust me
1) for each triangle mesh in the scene 1.1) for each triangle in 1.1.1) project the three vertices onto projection plane and keep track of distance to camera 1.1.2) for each pixel between the projected triangle vertices, 126.96.36.199) interpolate depth of pixel between vertices 188.8.131.52) if depth buffer contains no closer pixel, draw pixel on screen and and write distance to depth-buffer
Rasterization can also be parallelized nicely on triangle level so that each triangle can be dealt with by an individual thread/core. The nerds among the readers will notice however that this approach may lead to race-conditions in the depth buffer as the one single buffer that is shared and written to, which we will neglect for the moment.
As you can see, the outer loop in the rasterization setting iterates over all triangles in the scene. The inner loop iterates for each triangle over the pixels that need to be drawn. In contrast to that, Raytracing has an outer loop over pixels and an inner loop over objects to test for intersection, which is usually heavily optimized through spatial trees, however it may still be an inner loop over all triangles of all meshes in the scene in the worst case. The costly part is to test for intersections in the inner loop, because it usually involves solving equations, while rasterization just needs to project triangles from 3D to 2D and test whether the triangle needs to be clipped. This is basically why Rasterizaton is faster than Raytracing.
Alright, we are done with the overview. Please leave a comment if you have questions. I will try to answer them as soon as possible.
There is More...
The "there is more..." section always lists additional stuff that is related and may be interesting additional reading. It is not officially part of the tutorial, but just additional stuff on the topic.
"Lifted", a short movie from Pixar. One of my favorites and awesome example of animated movies.
- Learning Modern 3D Graphics: another very nice and recent 3D graphics tutorial written by Jason L. McKesson.