Schabby's Blog
OpenGL, Java, Cassandra and other stuff that totally makes the world go round

This blog post explains a common and versatile approach to OpenGL picking called "ray picking". An example in pseudo-code is given below.

Picking is the process of finding objects in your scene based on user input. Most commonly, you want to determine what object a user has clicked with his mouse. The 2D mouse coordinates serve as a reference on the view port to identify the projected objected that has been clicked. A similar scenario is a first person shooter, where the gun is basically the picking pointer. If you shoot the gun, the trajectory of the bullet is traced through the scene and collisions are detected, similar to a laser pointer shooting a ray through a scene until it hits an object and marks it with a small red dot.

Screenshot OpenGL Picking

A screenshot showing a hex map in an LWJGL application with a selected tile as an example for 3D picking

OpenGL Selection Mode vs. Ray Picking

There are two common ways of implementing picking. The first one makes use of a special OpenGL feature in which you render your scene in a "selection mode". In selection mode, you render your scene a few pixels around your mouse cursor. Also, you don't render to the frame buffer but to a special depth buffer from which you later read-out previously labeled polygons.

The second approach is the aformentioned "Ray Picking" and is independent from OpenGL. As outlined above, it simulates a line that is shot through the scene until it hits an object. I personally prefer this approach for the following reasons:

  • You don't need an additional render pass. A render pass can take some considerable amount of time and special clipping/culling code is required to only render the small part of the scene which you want to perform the picking on. In addition, you render a less complex scene (omitting textures, lighting, normals, etc) in selection mode if possible which can add a high degree of complexity to your drawing routines (imagine breaking up display lists). Ray picking obviously requires some code to trace the ray through your scene, but this is usually less complex.
  • Ray picking can be done independently from the rest of the game code as a simultaneous and "read-only" operation on the scene graph. In some environments such as AWT/JOGL where the input devices are handled by a separate thread, your picking code can run as part of the handler thread and does not impact the performance of the rest of the game. This comes in handy especially in scenarios where you need to perform picking very often (eg. mouse hovering, target laser on the in-game gun).
  • In pretty much all Java environments, calling OpenGL takes a considerable amount of time due to JNI. It is therefore beneficial to minimize the number OpenGL calls. Rendering in selection mode would mean additional OpenGL calls that I rather substitute with ray picking.

However, there are scenarios where ray picking is simply inferior or impractical; Consider a scene with a lot of small objects (eg. tree with little branches and leaves) or if you modify your geometry through shaders (eg. Level of Detail, Splines, etc.). Usually, these are scenaries where the rendering pipeline due to its design is more efficient than iterating over all objects on a scene and testing on intersection.

Picking by Ray Tracing

Ray Picking

Ray picking is the process of shooting a line (ie. "ray") from the camera through the 2D viewscreen (where the 3D scene is projected on) into the scene until it hits an object. To do so, we need to know the camera and the point on the viewscreen (for example mouse cursor position). The first point is trivial, however the second point is a bit more difficult to determine. Let's say we have a 2D point on the viewscreen (x, y) and now want to map this point into world coordinates. One way would be to make use of the inverted viewing matrix, but we follow a more geometrical approach: We compute the position of the plane of the viewscreen in world space and map the 2D point on this plane and from there into world space.

The algorithm in pseudo code follows below. Note that cameraLookAt is the 3D point where the camera looks at (as used on glLookat), cameraPosition is the current position of the camera in world space and cameraUp is the up vector of the camera.

x = getMouseX() // scalar
y = getMouseY() // scalar

view = cameraLookAt - cameraPosition // 3D float vector
normalize view

h = crossProduct( view, cameraUp ) // 3D float vector
normalize h

v = crossProduct( h, view) // 3D float vector
normalize v

ok, so far for h and v. Note that v is actually pointing in the same direction like the camera-up vector, but we compute it via the cross product to illustrate the method.

The vectors h and v are currently normalized to the length of 1. We now need to compute the lengths of h and v in the view port. In my example I assume that you set up your frustum by a field of view angle in degrees (fovy) as commonly the case with glFrustum. Let nearClippingDistance be the distance to the near clipping plane and width/height the ratio of the view port width divided by the viewport heigth.

// convert fovy to radians 
rad = fovy * PI / 180
vLength = tan( rad / 2 ) * nearClippingPlaneDistance
hLength = vLength * (width / height)

scale v by vLength
scale h by hLength

The two scalars vLength and hLength in combinaton with v and h help spanning the view port plane extending from the center point. At this point it is worth mentioning that it may make sense to only compute h and v when you update the camera orientation because the normalization involves a square root which is a relatively expensive operation (actually tan may also be avoided).

Now it is time to map the 2D mouse coordinates onto the view port plane.

// translate mouse coordinates so that the origin lies in the center
// of the view port
x -= width / 2
y -= height / 2

// scale mouse coordinates so that half the view port width and height
// becomes 1
y /= (height / 2)
x /= (width / 2)

// linear combination to compute intersection of picking ray with
// view port plane
pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y

// compute direction of picking ray by subtracting intersection point
// with camera position
dir = pos - cameraPos

That's pretty much it. We now got pos as the intersection point of the picking ray in the view port plane and the picking ray direction dir. This describes a line that can now be used to intersect with the individual objects of the scene.

// brute force
for all objects in the scene 
  test for intersection and keep closest
end for

So by intersecting the picking ray with all your visible objects in your scene while keeping track of the distance, you determine the object that is hit by your mouse cursor.
Checking all objects in our scene may not be very efficient though. So in a first attempt I recommend to test only objects that lie (partly) within your viewing frustum. Under normal circumstances you got that code already in your scene graph as part of the software culling routine when pumping obects into the render pipeline, but if you build your scene graph / game engine from scratch, you will need to deal with culling yourself.

Source Code

I received many requests for the source code (many more than I actually expected). I therefore provide the code in the following part alongside with some explanatory comments. Let's first have a look at two classes that we are going to need. The first one is a simple 3D Vector representation as floats. We will use it to represent the vectors in the later part. The vector class is certainly no rocket-science, so you may want to just briefly scan over it.

public class Vector3f 
{
	public float x, y, z;
 
	// Constructors as well as getters/setters omitted for brevity!!
	// Only important methods kept necessary for this tutorial.
	// The original class contains many more methods...
 
	public Vector3f add(Vector3f a) {
		x += a.x;
		y += a.y;
		z += a.z;
 
		return this;
	}
 
	public Vector3f set(Vector3f v)	{
		this.x = v.x;
		this.y = v.y;
		this.z = v.z;
 
		return this;
	}
 
	public Vector3f subAndAssign(Vector3f a, Vector3f b) {
		x = a.x - b.x;
		y = a.y - b.y;
		z = a.z - b.z;
 
		return this;
	}
 
	/**
	 * Returns the length of the vector, also called L2-Norm or Euclidean Norm.
	 */
	public float l2Norm() {
		return (float) Math.sqrt(x*x+y*y+z*z);
	}
 
	public Vector3f crossAndAssign(Vector3f a, Vector3f b) {
		float tempX = a.y * b.z - a.z * b.y;
		float tempY = a.z * b.x - a.x * b.z;
		float tempZ = a.x * b.y - a.y * b.x;
 
		x = tempX;
		y = tempY;
		z = tempZ;
 
		return this;
	}
 
	public Vector3f scale(float scalar) {
		x *= scalar;
		y *= scalar;
		z *= scalar;
 
		return this;
	}
 
	public Vector3f normalize() {
		float length = l2Norm();
		x /= length;
		y /= length;
		z /= length;
 
		return this;
	}
}

So far for our vector helper class. It should be too surprising. Similar classes can be found in many libs, including LWJGL.

The next class is actually important. It represents a line, more specifially the picking ray. In this case the line is represented by a point and a direction.

public class PickingRay 
{
	private Vector3f clickPosInWorld = new Vector3f();
	private Vector3f direction = new Vector3f();
 
	/**
	 * Computes the intersection of this ray with the X-Y Plane (where Z = 0)
	 * and writes it back to the provided vector.
	 */
	public void intersectionWithXyPlane(float[] worldPos)
	{
		float s = -clickPosInWorld.z / direction.z;
		worldPos[0] = clickPosInWorld.x+direction.x*s;
		worldPos[1] = clickPosInWorld.y+direction.y*s;
		worldPos[2] = 0;
	}
 
	public Vector3f getClickPosInWorld() {
		return clickPosInWorld;
	}
	public Vector3f getDirection() {
		return direction;
	}	
}

We are almost there. We just need to define the remaining prerequisites. You need to set them up before computing the actual picking ray:

  • position is a 3d vector (Vector3f) holding the world position of the camera
  • view is a 3d vector (Vector3f) holding the viewing direction of the camera
  • pickingRay is the helper class given above, basically it is a line with starting point PickingRay.clickPosInWorld and a direction PickingRay.direction. You need to instantiate it.
  • two scalars viewportWidth and viewportHeight being the width and height of the current view port.
  • two scalars screenX and screenY are the mouse coordinates on your 2D screen (origin in lower left corner)

Now we are able to compute the screen in world space. Note that screenHoritzontally and screenVertically correspond to h and v in the pseudo code above.

        // look direction
        view.subAndAssign(lookAt, position).normalize();
 
        // screenX
        screenHoritzontally.crossAndAssign(view, up).normalize();
 
        // screenY
        screenVertically.crossAndAssign(screenHoritzontally, view).normalize();
 
        final float radians = (float) (viewAngle*Math.PI / 180f);
        float halfHeight = (float) (Math.tan(radians/2)*nearClippingPlaneDistance);
        float halfScaledAspectRatio = halfHeight*getViewportAspectRatio();
 
        screenVertically.scale(halfHeight);
        screenHoritzontally.scale(halfScaledAspectRatio);

We now have the screen position in world space. lookAt is the point (!) in world coordinates where the camera looks at, so that view becomes the viewing direction. The scalar nearClippingPlaneDistance is the distance to the near clipping plane of your viewing frustum. Yes, I am still using pre-4.0 OpenGL code here ;)
Btw, you need to execute the code above only if you change the camera settings (view direction, position, etc.) to update the screen position in world space. Many frames are often rendered without updating the screen position, so that this may be a nice performance optimization (well, actually the vector normalization (square root) and tan() is a bit expensive).

Finally, in order to compute the actual picking lines, you need to do the following (screenX and screenY are the 2D mouse coordinates):

    public void picking(float screenX, float screenY, PickingRay pickingRay)
    {
        pickingRay.getClickPosInWorld().set(position);
        pickingRay.getClickPosInWorld().add(view);
 
        screenX -= (float)viewportWidth/2f;
        screenY -= (float)viewportHeight/2f;
 
        // normalize to 1
        screenX /= ((float)viewportWidth/2f);
        screenY /= ((float)viewportHeight/2f);
 
        pickingRay.getClickPosInWorld().x += screenHoritzontally.x*screenX + screenVertically.x*screenY;
        pickingRay.getClickPosInWorld().y += screenHoritzontally.y*screenX + screenVertically.y*screenY;
        pickingRay.getClickPosInWorld().z += screenHoritzontally.z*screenX + screenVertically.z*screenY;
 
        pickingRay.getDirection().set(pickingRay.getClickPosInWorld());
        pickingRay.getDirection().sub(position);
    }

Done! Now you can use the pickingRay instance to check for intersections with the objects in your scene.

I hope this helps, please leave a comment if you like the tutorial or if you jave a question.

Cheers,

Johannes


55 Antworten

  1. Bropocalypse says:

    Actually, scratch that, isn't cameraLookAt what we're trying to find? How is it we can use it at the start of the algorithm?

  2. Aleduc says:

    Hi there,

    I really like this explanation of ray picking and would like to see the source code you produced. Can you send it to dadmaleduc at yahoo dot es?

    Thanks and nice blog :)

    Diego,
    Duque

  3. csharp100 says:

    Is it possible that you would send me the source code for the whole project? I'm trying to figure where to put what. I have a flying cubes program I am working on and other sources are pretty vague.

  4. schabby says:

    Hey guys! I am thrilled about the positive feedback about my ray-picking post. It came a long way :) However, I see that many people still struggle with the concept and I have to admit they my post is probably not elaborate enough on code level. That's why I revisted the post and tried to make it more specific/clear. I hope this helps.

    As for the source code, I am not sure if it helps to post a complete program. The thing with GL programs is that rendering (shaders, geometry, buffers, etc.) requires awful amounts of boiler plate code, so that the actual picking part would only comprise 3% of the example which in turn makes the example difficult to read not and less concise regarding the picking matter.

    Best, Johannes

  5. Jamie Guillemette says:

    The reason ppl are asking for source is your example is dependant on logic that is not explained.

    example : pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y

    a vector object can not be added or multiplied by a float.

    so we would like to see how you accomplished some of your boiler plating prior to doing the picking.

    Dont post the whole project. Just the source relevant to the the topic of picking and unprojecting the mouse.

    j.

Post Comment

Please notice: Comments are moderated by an Admin.