This blog post explains a common and versatile approach to OpenGL picking called "ray picking". An example in pseudo-code is given below.

Picking is the process of finding objects in your scene based on user input. Most commonly, you want to determine what object a user has clicked with his mouse. The 2D mouse coordinates serve as a reference on the view port to identify the projected objected that has been clicked. A similar scenario is a first person shooter, where the gun is basically the picking pointer. If you shoot the gun, the trajectory of the bullet is traced through the scene and collisions are detected, similar to a laser pointer shooting a ray through a scene until it hits an object and marks it with a small red dot.

### OpenGL Selection Mode vs. Ray Picking

There are two common ways of implementing picking. The first one makes use of a special OpenGL feature in which you render your scene in a "selection mode". In selection mode, you render your scene a few pixels around your mouse cursor. Also, you don't render to the frame buffer but to a special depth buffer from which you later read-out previously labeled polygons.

The second approach is the aformentioned "Ray Picking" and is independent from OpenGL. As outlined above, it simulates a line that is shot through the scene until it hits an object. I personally prefer this approach for the following reasons:

- You don't need an additional render pass. A render pass can take some considerable amount of time and special clipping/culling code is required to only render the small part of the scene which you want to perform the picking on. In addition, you render a less complex scene (omitting textures, lighting, normals, etc) in selection mode if possible which can add a high degree of complexity to your drawing routines (imagine breaking up display lists). Ray picking obviously requires some code to trace the ray through your scene, but this is usually less complex.
- Ray picking can be done independently from the rest of the game code as a simultaneous and "read-only" operation on the scene graph. In some environments such as AWT/JOGL where the input devices are handled by a separate thread, your picking code can run as part of the handler thread and does not impact the performance of the rest of the game. This comes in handy especially in scenarios where you need to perform picking very often (eg. mouse hovering, target laser on the in-game gun).
- In pretty much all Java environments, calling OpenGL takes a considerable amount of time due to JNI. It is therefore beneficial to minimize the number OpenGL calls. Rendering in selection mode would mean additional OpenGL calls that I rather substitute with ray picking.

However, there are scenarios where ray picking is simply inferior or impractical; Consider a scene with a lot of small objects (eg. tree with little branches and leaves) or if you modify your geometry through shaders (eg. Level of Detail, Splines, etc.). Usually, these are scenaries where the rendering pipeline due to its design is more efficient than iterating over all objects on a scene and testing on intersection.

### Ray Picking

Ray picking is the process of shooting a line (ie. "ray") from the camera through the 2D viewscreen (where the 3D scene is projected on) into the scene until it hits an object. To do so, we need to know the camera and the point on the viewscreen (for example mouse cursor position). The first point is trivial, however the second point is a bit more difficult to determine. Let's say we have a 2D point on the viewscreen (x, y) and now want to map this point into world coordinates. One way would be to make use of the inverted viewing matrix, but we follow a more geometrical approach: We compute the position of the plane of the viewscreen in world space and map the 2D point on this plane and from there into world space.

The algorithm in pseudo code follows below. Note that `cameraLookAt` is the 3D point where the camera looks at (as used on `glLookat`), `cameraPosition` is the current position of the camera in world space and `cameraUp` is the up vector of the camera.

x = getMouseX() // scalar y = getMouseY() // scalar view = cameraLookAt - cameraPosition // 3D float vector normalize view h = crossProduct( view, cameraUp ) // 3D float vector normalize h v = crossProduct( h, view) // 3D float vector normalize v

ok, so far for `h` and `v`. Note that `v` is actually pointing in the same direction like the camera-up vector, but we compute it via the cross product to illustrate the method.

The vectors `h` and `v` are currently normalized to the length of 1. We now need to compute the lengths of `h` and `v` in the view port. In my example I assume that you set up your frustum by a field of view angle in degrees (`fovy`) as commonly the case with `glFrustum`. Let `nearClippingDistance` be the distance to the near clipping plane and `width/height` the ratio of the view port width divided by the viewport heigth.

// convert fovy to radians rad = fovy * PI / 180 vLength = tan( rad / 2 ) * nearClippingPlaneDistance hLength = vLength * (width / height) scale v by vLength scale h by hLength

The two scalars `vLength` and `hLength` in combinaton with `v` and `h` help spanning the view port plane extending from the center point. At this point it is worth mentioning that it may make sense to only compute `h` and `v` when you update the camera orientation because the normalization involves a square root which is a relatively expensive operation (actually `tan` may also be avoided).

Now it is time to map the 2D mouse coordinates onto the view port plane.

// translate mouse coordinates so that the origin lies in the center // of the view port x -= width / 2 y -= height / 2 // scale mouse coordinates so that half the view port width and height // becomes 1 y /= (height / 2) x /= (width / 2) // linear combination to compute intersection of picking ray with // view port plane pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y // compute direction of picking ray by subtracting intersection point // with camera position dir = pos - cameraPos

That's pretty much it. We now got `pos` as the intersection point of the picking ray in the view port plane and the picking ray direction `dir`. This describes a line that can now be used to intersect with the individual objects of the scene.

// brute force for all objects in the scene test for intersection and keep closest end for

So by intersecting the picking ray with all your visible objects in your scene while keeping track of the distance, you determine the object that is hit by your mouse cursor.

Checking all objects in our scene may not be very efficient though. So in a first attempt I recommend to test only objects that lie (partly) within your viewing frustum. Under normal circumstances you got that code already in your scene graph as part of the software culling routine when pumping obects into the render pipeline, but if you build your scene graph / game engine from scratch, you will need to deal with culling yourself.

## Source Code

I received many requests for the source code (many more than I actually expected). I therefore provide the code in the following part alongside with some explanatory comments. Let's first have a look at two classes that we are going to need. The first one is a simple 3D Vector representation as floats. We will use it to represent the vectors in the later part. The vector class is certainly no rocket-science, so you may want to just briefly scan over it.

public class Vector3f { public float x, y, z; // Constructors as well as getters/setters omitted for brevity!! // Only important methods kept necessary for this tutorial. // The original class contains many more methods... public Vector3f add(Vector3f a) { x += a.x; y += a.y; z += a.z; return this; } public Vector3f set(Vector3f v) { this.x = v.x; this.y = v.y; this.z = v.z; return this; } public Vector3f subAndAssign(Vector3f a, Vector3f b) { x = a.x - b.x; y = a.y - b.y; z = a.z - b.z; return this; } /** * Returns the length of the vector, also called L2-Norm or Euclidean Norm. */ public float l2Norm() { return (float) Math.sqrt(x*x+y*y+z*z); } public Vector3f crossAndAssign(Vector3f a, Vector3f b) { float tempX = a.y * b.z - a.z * b.y; float tempY = a.z * b.x - a.x * b.z; float tempZ = a.x * b.y - a.y * b.x; x = tempX; y = tempY; z = tempZ; return this; } public Vector3f scale(float scalar) { x *= scalar; y *= scalar; z *= scalar; return this; } public Vector3f normalize() { float length = l2Norm(); x /= length; y /= length; z /= length; return this; } } |

So far for our vector helper class. It should be too surprising. Similar classes can be found in many libs, including LWJGL.

The next class is actually important. It represents a line, more specifially the picking ray. In this case the line is represented by a point and a direction.

public class PickingRay { private Vector3f clickPosInWorld = new Vector3f(); private Vector3f direction = new Vector3f(); /** * Computes the intersection of this ray with the X-Y Plane (where Z = 0) * and writes it back to the provided vector. */ public void intersectionWithXyPlane(float[] worldPos) { float s = -clickPosInWorld.z / direction.z; worldPos[0] = clickPosInWorld.x+direction.x*s; worldPos[1] = clickPosInWorld.y+direction.y*s; worldPos[2] = 0; } public Vector3f getClickPosInWorld() { return clickPosInWorld; } public Vector3f getDirection() { return direction; } } |

We are almost there. We just need to define the remaining prerequisites. You need to set them up before computing the actual picking ray:

`position`is a 3d vector (`Vector3f`) holding the world position of the camera`view`is a 3d vector (`Vector3f`) holding the viewing direction of the camera`pickingRay`is the helper class given above, basically it is a line with starting point`PickingRay.clickPosInWorld`and a direction`PickingRay.direction`. You need to instantiate it.- two scalars
`viewportWidth`and`viewportHeight`being the width and height of the current view port. - two scalars
`screenX`and`screenY`are the mouse coordinates on your 2D screen (origin in lower left corner)

Now we are able to compute the screen in world space. Note that `screenHoritzontally` and `screenVertically` correspond to `h` and `v` in the pseudo code above.

// look direction view.subAndAssign(lookAt, position).normalize(); // screenX screenHoritzontally.crossAndAssign(view, up).normalize(); // screenY screenVertically.crossAndAssign(screenHoritzontally, view).normalize(); final float radians = (float) (viewAngle*Math.PI / 180f); float halfHeight = (float) (Math.tan(radians/2)*nearClippingPlaneDistance); float halfScaledAspectRatio = halfHeight*getViewportAspectRatio(); screenVertically.scale(halfHeight); screenHoritzontally.scale(halfScaledAspectRatio); |

We now have the screen position in world space. `lookAt` is the point (!) in world coordinates where the camera looks at, so that `view` becomes the viewing direction. The scalar `nearClippingPlaneDistance` is the distance to the near clipping plane of your viewing frustum. Yes, I am still using pre-4.0 OpenGL code here ðŸ˜‰

Btw, you need to execute the code above only if you change the camera settings (view direction, position, etc.) to update the screen position in world space. Many frames are often rendered without updating the screen position, so that this may be a nice performance optimization (well, actually the vector normalization (square root) and `tan()` is a bit expensive).

Finally, in order to compute the actual picking lines, you need to do the following (`screenX` and `screenY` are the 2D mouse coordinates):

public void picking(float screenX, float screenY, PickingRay pickingRay) { pickingRay.getClickPosInWorld().set(position); pickingRay.getClickPosInWorld().add(view); screenX -= (float)viewportWidth/2f; screenY -= (float)viewportHeight/2f; // normalize to 1 screenX /= ((float)viewportWidth/2f); screenY /= ((float)viewportHeight/2f); pickingRay.getClickPosInWorld().x += screenHoritzontally.x*screenX + screenVertically.x*screenY; pickingRay.getClickPosInWorld().y += screenHoritzontally.y*screenX + screenVertically.y*screenY; pickingRay.getClickPosInWorld().z += screenHoritzontally.z*screenX + screenVertically.z*screenY; pickingRay.getDirection().set(pickingRay.getClickPosInWorld()); pickingRay.getDirection().sub(position); } |

Done! Now you can use the `pickingRay` instance to check for intersections with the objects in your scene.

I hope this helps, please leave a comment if you like the tutorial or if you jave a question.

Cheers,

Johannes

Hi, I'm newbie in 3D graphics and I'm trying to use ray tracing for collision detection for a game similar to Tron race.

I've found your post very enlightening but I don't finally get how to use this to determine if the player is going to crash, because in that case I'm not mapping the X Y coordinates like you do with the mouse picking.

In my case I just have the coordinates of the player character (a motorcycle) and the direction it is going. But that is in world coordinates not in the 2d coordinates like the mouse cursor.

I don't know if I'm explaning myself pretty well but I'll appreciate your help!

Hi!

Thanks for your comment. The scenario you describe seems to be a typical collision detection problem. From what I understand is that you want to have several objects on a map that may collide. In this case I recommend you regard your objects on the map as simple boxes (ie. "bounding boxes") and test for collisions for each of the individual boxes. A slightly simpler approach may be to use the center points of the objects and compare the distances to each object and once the distance is closer than a certain threshold (ie. radius), you have a collision.

Both approaches need some tuning in a real-time environment, because I may happen that your collision lies between two time frames and you never detect it. But this is solvable as well.

Does this help?

Well, let me explain myself better. As I understood you used your window coordinates of the cursor to make the ray. Like this:

pos = cameraPos + view + h*x + v*y

and this:

dir = pos - cameraPos

In my case I don't have that because I'm not using the mouse to detect the collision, I have to use the player position (in world coordinates) to do so.

How this case, changes the procedure you made? Do the x and y of the player object work the same as the X and Y of the cursor?

Hi Noctum!

Well, if you want to cast the picking ray from an object in your scene that is different from the camera, you can indeed apply the same principle. However in my case, the problem I am solving is the mapping from a 2D screen space (mouse coordinates) into a 3D world space. In your case, you already are in world space, so that I do not fully understand why you want to burden yourself with this approach for collision detection. You dont necessarily need ray casting for collision detection, except your are trying to simulate a bullet shot and want to determine where the bullet hits a target.

Anywasys, in world space, the same principle may be applied as follows:

Let's say you want to cast the ray from an object at position O (being a 3D-vector) into direction I (again being a 3D-Vector). The ray is then described as a line of the form O + I*v, where v is the scalar to scale the length of the line. In addition, let us assume you have several objects K1, K2, K3, etc. as 3D mid-points in your scene that you want to detect a collision of the ray with. For simplicity, we assume each object with mid-point K1, K2, K3, etc. to be a sphere with radius r. All you need to do is then to solve the following equation (in the case for K1):

K1 - (O + I*v) < = r Since all parameters an known except for v and we are in 3D, you solve for v (analytically) and get an equation for v. If there is a v that satisfies the equation, you have a collision of the line with object K1. You need to compute the v for all K1, K2, K3, etc. respectevly, of course. Let me know if this helps

Johannes

Thank you that was really helpfull!

Keep the good work with the blog!

Nestor

Hi there,

I really like this explanation of ray picking and would like to see the source code you produced. Can you send it to jeroen at delgine dot com?

Thanks in advance. And thumbs up - you're doing a great job with this blog!

Cheers,

Jeroe

Hi Jeroe, sorry for the late reply!! I am happy to send you the code. It will actually only be one class called Camera in which I provide the routines for billboarding, picking, etc. Please let me know if you have questions.

Cheers,

Johannes

Great blog! Realy good explanation. Please share the source code.

Great explanation, could you share the source code please?

or send me to email please?

hi Yoda!

I will send it to you right away! Sorry for the delay!

Regards, Johannes

hi,

can you please share the source code to me ?

thx for great tutorial.

Hi,

Thank you for your explanation of ray picking, it is very well made and in detail, it helped me understand the way 3D picking is made, but i have a question, isn't V vector the same as the camera up vector?

Hi, yes!! V points in the same direction like the camera-up vector. It gets scaled to a different length though (not normalized to length 1). You could indeed save the cross product part for V and simply copy the camera-up vector and later rescale the copy.

Good point, thanks

Johannes

Thank you, once again.

Sorry for my misunderstanding, i was mislead by the fact that in opengles you specify an up direction, not an up vector as i thought,so there is no way of getting the up vector of the camera without calculating it the way you did it above.

can i receive the source code please?

Hello,

Thx for tutorial, and can i get the source code too?

Hey,

Great explanation, can I get the source code please. send it to andreea dot zor at gmail dot com

HI! Great tutorial! Can I getthe entire source code please? thx

hi,

can you send me the source code?

the mail is:

g dot salvati at gmail dot com

Hey,

Fantastic tutorial. I was wondering if I could get the source code.

Could I get the source code please? if you could send it to twfolsom11@charter.net. thanks

Thanks for the tutorial but I just wanted to add a correction. The final formula for pos you have written is "pos = cameraPos + view + h*x + v*y", but this should be pos = cameraPos + nearClippingPlaneDistance*view + h*x + v*y. This new formula should correctly translate the point onto the view plane.

Great article, but I have a few questions

I have an image created with 3ds Max. Near clipping plane is 0. Far clipping plane is 1000. If I follow your instructions vLength becomes 0 and hLength becomes 0. Is that a problem? Also, how do you "scale v by vLength" and and how do you perform "cameraPos + view + h*x + v*y". Sorry I'm relatively new to vectors and matrices. Any help by anyone much appreciated !

Hi Johannes,

I discovered that my near clipping plane distance is actually 1 but I still can't produce a meaningful result. The values in v, h and view (all normalized) are tiny compared to my camera position coordinates (0,120,390), so cameraPos + view + h*x + v*y is virtually the same as cameraPos. The x and y values are also tiny because they have been scaled down to between -1 and 1. When you say "scale v by vLength" do you mean multiply each element of v by vLength? Obviously I'm still doing something wrong. If you haven't got time to explain maybe you could be kind enough to send me the source code.

Regards, Darren.

Got it working. Yippee!

Hehe, awesome!

//Intersect ray with plane Z=0 (playing surface)

//The equation of a plane is Ax + By + Cz + D = 0, represented here as [A, B, C, D]

var P=[0, 0, 1, 0];

//Find the unit normal of the plane

var mag=Math.sqrt(P[0]*P[0]+P[1]*P[1]+P[2]*P[2]);

var N=[P[0]/mag, P[1]/mag, P[2]/mag];

//Get ray length

var t=-(DotProduct(N, cameraPos) + P[3]) / DotProduct(N, dir);

var intersect=[ cameraPos[0]+t*dir[0], cameraPos[1]+t*dir[1], cameraPos[2]+t*dir[2] ];

//Test the result by drawing an image on the screen

DrawImage(myImage,intersect[0],intersect[1],intersect[2]);

Regards, Darren

footydomain.com

could you please send me the source code ?

thaks

Hi guys, I updated my post, now providing the source code!! Enjoy,

Johannes

Hi schabby,

Thanks for the source code. I am trying to use your approach to pick single vertex. My application contains only one object which is a face.

But my problem is how to find out the closest vertex to the ray.

Any ideas??

Cheers

Hi schabby,

Thanks For sharing source code.i m trying to use ur Mecanism in order to implement picking .

In my application haveing several traingles Which was drawn on the screen Using glDrawElements with mode GL_TRAINGLE_STRIP.

Now I have Give provide to pick The Particular point Information from mouse intersection. for that we need to implement ray tracing . My Problem Is When I m using Perspective Projection how i need to generate ray such way that i can find my triangle information.

Any Ideas????

Cheers.

Hi,

well it is actually pretty much the same approach as outlined in my post. You need to set up the ray and test for intersections with each triangle in world space. Triangle intersection can be computed in closed form, meaning that there is a formula that spits out true/false for a given triangle and line. You will have to google the formula though

Hope this helps,

Johannes

Thanks for the code provided, it was super helpful.

A problem we found was that there is a problem when Near is different to "1".

The line that reads: pos = cameraPos + view + h*x + v*y

Should be: pos = cameraPos + (view * Near) + h*x + v*y

Thanks!

Post 23 by Chris completely fixed the issue I was having, you should definitely alter your source and pseudo code. I believe pardoman also has the same solution.

Hi,

thanks for your comment. I will update my post accordingly!

btw, I am revising my tutorials at the moment to bring them up to date with OpenGL 4.0. Picking/Selection will work slightly different then, because one can derive most of the stuff right of the view matrix.

Cheers,

Johannes

Good afternoon.

I've tried to understand how your provided source code works, but no luck

pickingRay.getDirection().sub(position);

Since provided vector class does not have this method, i understand it should be same as

pickingRay.getDirection().sub(pickingRay.getDirection(), position); ?

Also, how does exactly "intersection" checking work? Do i just feed empty float array to intersectionWithXyPlane and then check numbers for something?

Is there a possibility for you to post full source code?

Cheers,

Ray

Great stuff! Thank you.

Hi. I didn't find the source code,

please send me it

Cheers,

Astemir

Hi Gabriele, the source code is "in-line" in the post, no download Best, J

It's nice tutorial,

Can you please provide source code for same to me.

Thanks

Hi could you send me the source code? Im still a little confused

Can you please provide source code for same to me.

Only one thing - in OpenGL, the Y-axis runs from the bottom to the top of NDC coordinate space, and from the top to the bottom of screen space - you have to invert Y to make this work correctly for OpenGL

Hey,

nice tutorial it help me alot to understand it. But i dont understand how intersectionWithXyPlane(float[]) works.

Were i call this function? maybe it is possiblie tell how i can bind it into an mouseclick event. what function i need to call to get the cordinates in opengl.

I try to implement ray picking for Ortho projection.

How do I compute the lengths of h and v in the view port. The view angle is for orthonormal projection 0.

So the equation vLength = tan( rad / 2 ) * nearClippingPlaneDistance can not be used anymore...

What a great work!!!!!

well done, I am intrigued to see the code. could you please send it to me. baamiis7 at gmail dot com.

Khalid

double g_center[3] = {0, 0, 0};

double g_eye [3] = {-5, 5, 5};

double g_up [3] = {0, 1, 0};

glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

// clear the matrix

glLoadIdentity();

// viewing transformation

gluLookAt (

g_eye [0], g_eye [1], g_eye [2],

g_center [0], g_center[1], g_center[2],

g_up [0], g_up [1], g_up [2]);

...

double vectorView [3];

double vectorV [3];

double vectorH [3];

//view = cameraLookAt - cameraPosition // 3D float vector

//normalize view

Substraction3_DBL (vectorView, g_center, g_eye);

Normalize3_DBL (vectorView);

//h = crossProduct( view, cameraUp ) // 3D float vector

//normalize h

CrossProduct3_DBL (vectorH, vectorView, g_up);

Normalize3_DBL (vectorH);

//v = crossProduct( h, view) // 3D float vector

//normalize v

//Note that v is actually pointing in the same direction like the camera-up vector,

// but we compute it via the cross product to illustrate the method.

CrossProduct3_DBL (vectorV, vectorH, vectorView);

Normalize3_DBL (vectorV);

unfortunately i can go any further because

vectorV = {0.40824829046386307, 0.81649658092772615, -0.40824829046386307}

as u can see is far from g_up [3] = {0, 1, 0};

Well done.

But I don't know how to apply: "intersection" checking? intersectionWithXyPlane()?

Please help

Thanks

There's something I don't understand about the pseudocode.

view = cameraLookAt - cameraPosition // 3D float vector

normalize view

if gluLookAt doesn't return a value, how can this be used this way?