This blog post explains a common and versatile approach to OpenGL picking called "ray picking". An example in pseudo-code is given below.

Picking is the process of finding objects in your scene based on user input. Most commonly, you want to determine what object a user has clicked with his mouse. The 2D mouse coordinates serve as a reference on the view port to identify the projected objected that has been clicked. A similar scenario is a first person shooter, where the gun is basically the picking pointer. If you shoot the gun, the trajectory of the bullet is traced through the scene and collisions are detected, similar to a laser pointer shooting a ray through a scene until it hits an object and marks it with a small red dot.

### OpenGL Selection Mode vs. Ray Picking

There are two common ways of implementing picking. The first one makes use of a special OpenGL feature in which you render your scene in a "selection mode". In selection mode, you render your scene a few pixels around your mouse cursor. Also, you don't render to the frame buffer but to a special depth buffer from which you later read-out previously labeled polygons.

The second approach is the aformentioned "Ray Picking" and is independent from OpenGL. As outlined above, it simulates a line that is shot through the scene until it hits an object. I personally prefer this approach for the following reasons:

- You don't need an additional render pass. A render pass can take some considerable amount of time and special clipping/culling code is required to only render the small part of the scene which you want to perform the picking on. In addition, you render a less complex scene (omitting textures, lighting, normals, etc) in selection mode if possible which can add a high degree of complexity to your drawing routines (imagine breaking up display lists). Ray picking obviously requires some code to trace the ray through your scene, but this is usually less complex.
- Ray picking can be done independently from the rest of the game code as a simultaneous and "read-only" operation on the scene graph. In some environments such as AWT/JOGL where the input devices are handled by a separate thread, your picking code can run as part of the handler thread and does not impact the performance of the rest of the game. This comes in handy especially in scenarios where you need to perform picking very often (eg. mouse hovering, target laser on the in-game gun).
- In pretty much all Java environments, calling OpenGL takes a considerable amount of time due to JNI. It is therefore beneficial to minimize the number OpenGL calls. Rendering in selection mode would mean additional OpenGL calls that I rather substitute with ray picking.

However, there are scenarios where ray picking is simply inferior or impractical; Consider a scene with a lot of small objects (eg. tree with little branches and leaves) or if you modify your geometry through shaders (eg. Level of Detail, Splines, etc.). Usually, these are scenaries where the rendering pipeline due to its design is more efficient than iterating over all objects on a scene and testing on intersection.

### Ray Picking

Ray picking is the process of shooting a line (ie. "ray") from the camera through the 2D viewscreen (where the 3D scene is projected on) into the scene until it hits an object. To do so, we need to know the camera and the point on the viewscreen (for example mouse cursor position). The first point is trivial, however the second point is a bit more difficult to determine. Let's say we have a 2D point on the viewscreen (x, y) and now want to map this point into world coordinates. One way would be to make use of the inverted viewing matrix, but we follow a more geometrical approach: We compute the position of the plane of the viewscreen in world space and map the 2D point on this plane and from there into world space.

The algorithm in pseudo code follows below. Note that `cameraLookAt` is the 3D point where the camera looks at (as used on `glLookat`), `cameraPosition` is the current position of the camera in world space and `cameraUp` is the up vector of the camera.

x = getMouseX() // scalar y = getMouseY() // scalar view = cameraLookAt - cameraPosition // 3D float vector normalize view h = crossProduct( view, cameraUp ) // 3D float vector normalize h v = crossProduct( h, view) // 3D float vector normalize v

ok, so far for `h` and `v`. Note that `v` is actually pointing in the same direction like the camera-up vector, but we compute it via the cross product to illustrate the method.

The vectors `h` and `v` are currently normalized to the length of 1. We now need to compute the lengths of `h` and `v` in the view port. In my example I assume that you set up your frustum by a field of view angle in degrees (`fovy`) as commonly the case with `glFrustum`. Let `nearClippingDistance` be the distance to the near clipping plane and `width/height` the ratio of the view port width divided by the viewport heigth.

// convert fovy to radians rad = fovy * PI / 180 vLength = tan( rad / 2 ) * nearClippingPlaneDistance hLength = vLength * (width / height) scale v by vLength scale h by hLength

The two scalars `vLength` and `hLength` in combinaton with `v` and `h` help spanning the view port plane extending from the center point. At this point it is worth mentioning that it may make sense to only compute `h` and `v` when you update the camera orientation because the normalization involves a square root which is a relatively expensive operation (actually `tan` may also be avoided).

Now it is time to map the 2D mouse coordinates onto the view port plane.

// translate mouse coordinates so that the origin lies in the center // of the view port x -= width / 2 y -= height / 2 // scale mouse coordinates so that half the view port width and height // becomes 1 y /= (height / 2) x /= (width / 2) // linear combination to compute intersection of picking ray with // view port plane pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y // compute direction of picking ray by subtracting intersection point // with camera position dir = pos - cameraPos

That's pretty much it. We now got `pos` as the intersection point of the picking ray in the view port plane and the picking ray direction `dir`. This describes a line that can now be used to intersect with the individual objects of the scene.

// brute force for all objects in the scene test for intersection and keep closest end for

So by intersecting the picking ray with all your visible objects in your scene while keeping track of the distance, you determine the object that is hit by your mouse cursor.

Checking all objects in our scene may not be very efficient though. So in a first attempt I recommend to test only objects that lie (partly) within your viewing frustum. Under normal circumstances you got that code already in your scene graph as part of the software culling routine when pumping obects into the render pipeline, but if you build your scene graph / game engine from scratch, you will need to deal with culling yourself.

## Source Code

I received many requests for the source code (many more than I actually expected). I therefore provide the code in the following part alongside with some explanatory comments. Let's first have a look at two classes that we are going to need. The first one is a simple 3D Vector representation as floats. We will use it to represent the vectors in the later part. The vector class is certainly no rocket-science, so you may want to just briefly scan over it.

public class Vector3f { public float x, y, z; // Constructors as well as getters/setters omitted for brevity!! // Only important methods kept necessary for this tutorial. // The original class contains many more methods... public Vector3f add(Vector3f a) { x += a.x; y += a.y; z += a.z; return this; } public Vector3f set(Vector3f v) { this.x = v.x; this.y = v.y; this.z = v.z; return this; } public Vector3f subAndAssign(Vector3f a, Vector3f b) { x = a.x - b.x; y = a.y - b.y; z = a.z - b.z; return this; } /** * Returns the length of the vector, also called L2-Norm or Euclidean Norm. */ public float l2Norm() { return (float) Math.sqrt(x*x+y*y+z*z); } public Vector3f crossAndAssign(Vector3f a, Vector3f b) { float tempX = a.y * b.z - a.z * b.y; float tempY = a.z * b.x - a.x * b.z; float tempZ = a.x * b.y - a.y * b.x; x = tempX; y = tempY; z = tempZ; return this; } public Vector3f scale(float scalar) { x *= scalar; y *= scalar; z *= scalar; return this; } public Vector3f normalize() { float length = l2Norm(); x /= length; y /= length; z /= length; return this; } } |

So far for our vector helper class. It should be too surprising. Similar classes can be found in many libs, including LWJGL.

The next class is actually important. It represents a line, more specifially the picking ray. In this case the line is represented by a point and a direction.

public class PickingRay { private Vector3f clickPosInWorld = new Vector3f(); private Vector3f direction = new Vector3f(); /** * Computes the intersection of this ray with the X-Y Plane (where Z = 0) * and writes it back to the provided vector. */ public void intersectionWithXyPlane(float[] worldPos) { float s = -clickPosInWorld.z / direction.z; worldPos[0] = clickPosInWorld.x+direction.x*s; worldPos[1] = clickPosInWorld.y+direction.y*s; worldPos[2] = 0; } public Vector3f getClickPosInWorld() { return clickPosInWorld; } public Vector3f getDirection() { return direction; } } |

We are almost there. We just need to define the remaining prerequisites. You need to set them up before computing the actual picking ray:

`position`is a 3d vector (`Vector3f`) holding the world position of the camera`view`is a 3d vector (`Vector3f`) holding the viewing direction of the camera`pickingRay`is the helper class given above, basically it is a line with starting point`PickingRay.clickPosInWorld`and a direction`PickingRay.direction`. You need to instantiate it.- two scalars
`viewportWidth`and`viewportHeight`being the width and height of the current view port. - two scalars
`screenX`and`screenY`are the mouse coordinates on your 2D screen (origin in lower left corner)

Now we are able to compute the screen in world space. Note that `screenHoritzontally` and `screenVertically` correspond to `h` and `v` in the pseudo code above.

// look direction view.subAndAssign(lookAt, position).normalize(); // screenX screenHoritzontally.crossAndAssign(view, up).normalize(); // screenY screenVertically.crossAndAssign(screenHoritzontally, view).normalize(); final float radians = (float) (viewAngle*Math.PI / 180f); float halfHeight = (float) (Math.tan(radians/2)*nearClippingPlaneDistance); float halfScaledAspectRatio = halfHeight*getViewportAspectRatio(); screenVertically.scale(halfHeight); screenHoritzontally.scale(halfScaledAspectRatio); |

We now have the screen position in world space. `lookAt` is the point (!) in world coordinates where the camera looks at, so that `view` becomes the viewing direction. The scalar `nearClippingPlaneDistance` is the distance to the near clipping plane of your viewing frustum. Yes, I am still using pre-4.0 OpenGL code here ðŸ˜‰

Btw, you need to execute the code above only if you change the camera settings (view direction, position, etc.) to update the screen position in world space. Many frames are often rendered without updating the screen position, so that this may be a nice performance optimization (well, actually the vector normalization (square root) and `tan()` is a bit expensive).

Finally, in order to compute the actual picking lines, you need to do the following (`screenX` and `screenY` are the 2D mouse coordinates):

public void picking(float screenX, float screenY, PickingRay pickingRay) { pickingRay.getClickPosInWorld().set(position); pickingRay.getClickPosInWorld().add(view); screenX -= (float)viewportWidth/2f; screenY -= (float)viewportHeight/2f; // normalize to 1 screenX /= ((float)viewportWidth/2f); screenY /= ((float)viewportHeight/2f); pickingRay.getClickPosInWorld().x += screenHoritzontally.x*screenX + screenVertically.x*screenY; pickingRay.getClickPosInWorld().y += screenHoritzontally.y*screenX + screenVertically.y*screenY; pickingRay.getClickPosInWorld().z += screenHoritzontally.z*screenX + screenVertically.z*screenY; pickingRay.getDirection().set(pickingRay.getClickPosInWorld()); pickingRay.getDirection().sub(position); } |

Done! Now you can use the `pickingRay` instance to check for intersections with the objects in your scene.

I hope this helps, please leave a comment if you like the tutorial or if you jave a question.

Cheers,

Johannes

Actually, scratch that, isn't cameraLookAt what we're trying to find? How is it we can use it at the start of the algorithm?

Hi there,

I really like this explanation of ray picking and would like to see the source code you produced. Can you send it to dadmaleduc at yahoo dot es?

Thanks and nice blog

Diego,

Duque

Is it possible that you would send me the source code for the whole project? I'm trying to figure where to put what. I have a flying cubes program I am working on and other sources are pretty vague.

Hey guys! I am thrilled about the positive feedback about my ray-picking post. It came a long way However, I see that many people still struggle with the concept and I have to admit they my post is probably not elaborate enough on code level. That's why I revisted the post and tried to make it more specific/clear. I hope this helps.

As for the source code, I am not sure if it helps to post a complete program. The thing with GL programs is that rendering (shaders, geometry, buffers, etc.) requires awful amounts of boiler plate code, so that the actual picking part would only comprise 3% of the example which in turn makes the example difficult to read not and less concise regarding the picking matter.

Best, Johannes

The reason ppl are asking for source is your example is dependant on logic that is not explained.

example : pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y

a vector object can not be added or multiplied by a float.

so we would like to see how you accomplished some of your boiler plating prior to doing the picking.

Dont post the whole project. Just the source relevant to the the topic of picking and unprojecting the mouse.

j.

anyone tried? has application? mio can send to me? how these methods should be implemented ... class .. where exactly called, put? I need this for 3D objects ...

I am the type of person that learns best when I see a working program and can cut and paste from the original until it does what I need.

PLEASE post a complete working program.

Hi,

Thank you for this tutorial, it's quite clear and very helpful. I implemented it and it works very well with a perspective camera, but not with an orthographic one, which I guess makes sense.

How would you go about implementing picking with an orthographic camera?

Thanx a lot Schabby for that article. I was searching for a simple and quick way to do picking and you article was very helpful.

NÃ©stor