Schabby's Blog
Reserve Orbital Defence Commander

This blog post explains a common and versatile approach to OpenGL picking called "ray picking". An example in pseudo-code is given below.

Picking is the process of finding objects in your scene based on user input. Most commonly, you want to determine what object a user has clicked with his mouse. The 2D mouse coordinates serve as a reference on the view port to identify the projected objected that has been clicked. A similar scenario is a first person shooter, where the gun is basically the picking pointer. If you shoot the gun, the trajectory of the bullet is traced through the scene and collisions are detected, similar to a laser pointer shooting a ray through a scene until it hits an object and marks it with a small red dot.

Screenshot OpenGL Picking

A screenshot showing a hex map in an LWJGL application with a selected tile as an example for 3D picking

OpenGL Selection Mode vs. Ray Picking

There are two common ways of implementing picking. The first one makes use of a special OpenGL feature in which you render your scene in a "selection mode". In selection mode, you render your scene a few pixels around your mouse cursor. Also, you don't render to the frame buffer but to a special depth buffer from which you later read-out previously labeled polygons.

The second approach is the aformentioned "Ray Picking" and is independent from OpenGL. As outlined above, it simulates a line that is shot through the scene until it hits an object. I personally prefer this approach for the following reasons:

  • You don't need an additional render pass. A render pass can take some considerable amount of time and special clipping/culling code is required to only render the small part of the scene which you want to perform the picking on. In addition, you render a less complex scene (omitting textures, lighting, normals, etc) in selection mode if possible which can add a high degree of complexity to your drawing routines (imagine breaking up display lists). Ray picking obviously requires some code to trace the ray through your scene, but this is usually less complex.
  • Ray picking can be done independently from the rest of the game code as a simultaneous and "read-only" operation on the scene graph. In some environments such as AWT/JOGL where the input devices are handled by a separate thread, your picking code can run as part of the handler thread and does not impact the performance of the rest of the game. This comes in handy especially in scenarios where you need to perform picking very often (eg. mouse hovering, target laser on the in-game gun).
  • In pretty much all Java environments, calling OpenGL takes a considerable amount of time due to JNI. It is therefore beneficial to minimize the number OpenGL calls. Rendering in selection mode would mean additional OpenGL calls that I rather substitute with ray picking.

However, there are scenarios where ray picking is simply inferior or impractical; Consider a scene with a lot of small objects (eg. tree with little branches and leaves) or if you modify your geometry through shaders (eg. Level of Detail, Splines, etc.). Usually, these are scenaries where the rendering pipeline due to its design is more efficient than iterating over all objects on a scene and testing on intersection.

Picking by Ray Tracing

Ray Picking

Ray picking is the process of shooting a line (ie. "ray") from the camera through the 2D viewscreen (where the 3D scene is projected on) into the scene until it hits an object. To do so, we need to know the camera and the point on the viewscreen (for example mouse cursor position). The first point is trivial, however the second point is a bit more difficult to determine. Let's say we have a 2D point on the viewscreen (x, y) and now want to map this point into world coordinates. One way would be to make use of the inverted viewing matrix, but we follow a more geometrical approach: We compute the position of the plane of the viewscreen in world space and map the 2D point on this plane and from there into world space.

The algorithm in pseudo code follows below. Note that cameraLookAt is the 3D point where the camera looks at (as used on glLookat), cameraPosition is the current position of the camera in world space and cameraUp is the up vector of the camera.

x = getMouseX() // scalar
y = getMouseY() // scalar

view = cameraLookAt - cameraPosition // 3D float vector
normalize view

h = crossProduct( view, cameraUp ) // 3D float vector
normalize h

v = crossProduct( h, view) // 3D float vector
normalize v

ok, so far for h and v. Note that v is actually pointing in the same direction like the camera-up vector, but we compute it via the cross product to illustrate the method.

The vectors h and v are currently normalized to the length of 1. We now need to compute the lengths of h and v in the view port. In my example I assume that you set up your frustum by a field of view angle in degrees (fovy) as commonly the case with glFrustum. Let nearClippingDistance be the distance to the near clipping plane and width/height the ratio of the view port width divided by the viewport heigth.

// convert fovy to radians 
rad = fovy * PI / 180
vLength = tan( rad / 2 ) * nearClippingPlaneDistance
hLength = vLength * (width / height)

scale v by vLength
scale h by hLength

The two scalars vLength and hLength in combinaton with v and h help spanning the view port plane extending from the center point. At this point it is worth mentioning that it may make sense to only compute h and v when you update the camera orientation because the normalization involves a square root which is a relatively expensive operation (actually tan may also be avoided).

Now it is time to map the 2D mouse coordinates onto the view port plane.

// translate mouse coordinates so that the origin lies in the center
// of the view port
x -= width / 2
y -= height / 2

// scale mouse coordinates so that half the view port width and height
// becomes 1
y /= (height / 2)
x /= (width / 2)

// linear combination to compute intersection of picking ray with
// view port plane
pos = cameraPos + view*nearClippingPlaneDistance + h*x + v*y

// compute direction of picking ray by subtracting intersection point
// with camera position
dir = pos - cameraPos

That's pretty much it. We now got pos as the intersection point of the picking ray in the view port plane and the picking ray direction dir. This describes a line that can now be used to intersect with the individual objects of the scene.

// brute force
for all objects in the scene 
  test for intersection and keep closest
end for

So by intersecting the picking ray with all your visible objects in your scene while keeping track of the distance, you determine the object that is hit by your mouse cursor.
Checking all objects in our scene may not be very efficient though. So in a first attempt I recommend to test only objects that lie (partly) within your viewing frustum. Under normal circumstances you got that code already in your scene graph as part of the software culling routine when pumping obects into the render pipeline, but if you build your scene graph / game engine from scratch, you will need to deal with culling yourself.

Source Code

I received many requests for the source code (many more than I actually expected). I therefore provide the code in the following part alongside with some explanatory comments. Let's first have a look at two classes that we are going to need. The first one is a simple 3D Vector representation as floats. We will use it to represent the vectors in the later part. The vector class is certainly no rocket-science, so you may want to just briefly scan over it.

public class Vector3f 
{
	public float x, y, z;
 
	// Constructors as well as getters/setters omitted for brevity!!
	// Only important methods kept necessary for this tutorial.
	// The original class contains many more methods...
 
	public Vector3f add(Vector3f a) {
		x += a.x;
		y += a.y;
		z += a.z;
 
		return this;
	}
 
	public Vector3f set(Vector3f v)	{
		this.x = v.x;
		this.y = v.y;
		this.z = v.z;
 
		return this;
	}
 
	public Vector3f subAndAssign(Vector3f a, Vector3f b) {
		x = a.x - b.x;
		y = a.y - b.y;
		z = a.z - b.z;
 
		return this;
	}
 
	/**
	 * Returns the length of the vector, also called L2-Norm or Euclidean Norm.
	 */
	public float l2Norm() {
		return (float) Math.sqrt(x*x+y*y+z*z);
	}
 
	public Vector3f crossAndAssign(Vector3f a, Vector3f b) {
		float tempX = a.y * b.z - a.z * b.y;
		float tempY = a.z * b.x - a.x * b.z;
		float tempZ = a.x * b.y - a.y * b.x;
 
		x = tempX;
		y = tempY;
		z = tempZ;
 
		return this;
	}
 
	public Vector3f scale(float scalar) {
		x *= scalar;
		y *= scalar;
		z *= scalar;
 
		return this;
	}
 
	public Vector3f normalize() {
		float length = l2Norm();
		x /= length;
		y /= length;
		z /= length;
 
		return this;
	}
}

So far for our vector helper class. It should be too surprising. Similar classes can be found in many libs, including LWJGL.

The next class is actually important. It represents a line, more specifially the picking ray. In this case the line is represented by a point and a direction.

public class PickingRay 
{
	private Vector3f clickPosInWorld = new Vector3f();
	private Vector3f direction = new Vector3f();
 
	/**
	 * Computes the intersection of this ray with the X-Y Plane (where Z = 0)
	 * and writes it back to the provided vector.
	 */
	public void intersectionWithXyPlane(float[] worldPos)
	{
		float s = -clickPosInWorld.z / direction.z;
		worldPos[0] = clickPosInWorld.x+direction.x*s;
		worldPos[1] = clickPosInWorld.y+direction.y*s;
		worldPos[2] = 0;
	}
 
	public Vector3f getClickPosInWorld() {
		return clickPosInWorld;
	}
	public Vector3f getDirection() {
		return direction;
	}	
}

We are almost there. We just need to define the remaining prerequisites. You need to set them up before computing the actual picking ray:

  • position is a 3d vector (Vector3f) holding the world position of the camera
  • view is a 3d vector (Vector3f) holding the viewing direction of the camera
  • pickingRay is the helper class given above, basically it is a line with starting point PickingRay.clickPosInWorld and a direction PickingRay.direction. You need to instantiate it.
  • two scalars viewportWidth and viewportHeight being the width and height of the current view port.
  • two scalars screenX and screenY are the mouse coordinates on your 2D screen (origin in lower left corner)

Now we are able to compute the screen in world space. Note that screenHoritzontally and screenVertically correspond to h and v in the pseudo code above.

        // look direction
        view.subAndAssign(lookAt, position).normalize();
 
        // screenX
        screenHoritzontally.crossAndAssign(view, up).normalize();
 
        // screenY
        screenVertically.crossAndAssign(screenHoritzontally, view).normalize();
 
        final float radians = (float) (viewAngle*Math.PI / 180f);
        float halfHeight = (float) (Math.tan(radians/2)*nearClippingPlaneDistance);
        float halfScaledAspectRatio = halfHeight*getViewportAspectRatio();
 
        screenVertically.scale(halfHeight);
        screenHoritzontally.scale(halfScaledAspectRatio);

We now have the screen position in world space. lookAt is the point (!) in world coordinates where the camera looks at, so that view becomes the viewing direction. The scalar nearClippingPlaneDistance is the distance to the near clipping plane of your viewing frustum. Yes, I am still using pre-4.0 OpenGL code here 😉
Btw, you need to execute the code above only if you change the camera settings (view direction, position, etc.) to update the screen position in world space. Many frames are often rendered without updating the screen position, so that this may be a nice performance optimization (well, actually the vector normalization (square root) and tan() is a bit expensive).

Finally, in order to compute the actual picking lines, you need to do the following (screenX and screenY are the 2D mouse coordinates):

    public void picking(float screenX, float screenY, PickingRay pickingRay)
    {
        pickingRay.getClickPosInWorld().set(position);
        pickingRay.getClickPosInWorld().add(view);
 
        screenX -= (float)viewportWidth/2f;
        screenY -= (float)viewportHeight/2f;
 
        // normalize to 1
        screenX /= ((float)viewportWidth/2f);
        screenY /= ((float)viewportHeight/2f);
 
        pickingRay.getClickPosInWorld().x += screenHoritzontally.x*screenX + screenVertically.x*screenY;
        pickingRay.getClickPosInWorld().y += screenHoritzontally.y*screenX + screenVertically.y*screenY;
        pickingRay.getClickPosInWorld().z += screenHoritzontally.z*screenX + screenVertically.z*screenY;
 
        pickingRay.getDirection().set(pickingRay.getClickPosInWorld());
        pickingRay.getDirection().sub(position);
    }

Done! Now you can use the pickingRay instance to check for intersections with the objects in your scene.

I hope this helps, please leave a comment if you like the tutorial or if you jave a question.

Cheers,

Johannes


59 Antworten

  1. Nocturn says:

    Hi, I'm newbie in 3D graphics and I'm trying to use ray tracing for collision detection for a game similar to Tron race.

    I've found your post very enlightening but I don't finally get how to use this to determine if the player is going to crash, because in that case I'm not mapping the X Y coordinates like you do with the mouse picking.

    In my case I just have the coordinates of the player character (a motorcycle) and the direction it is going. But that is in world coordinates not in the 2d coordinates like the mouse cursor.

    I don't know if I'm explaning myself pretty well but I'll appreciate your help!

  2. schabby says:

    Hi!

    Thanks for your comment. The scenario you describe seems to be a typical collision detection problem. From what I understand is that you want to have several objects on a map that may collide. In this case I recommend you regard your objects on the map as simple boxes (ie. "bounding boxes") and test for collisions for each of the individual boxes. A slightly simpler approach may be to use the center points of the objects and compare the distances to each object and once the distance is closer than a certain threshold (ie. radius), you have a collision.

    Both approaches need some tuning in a real-time environment, because I may happen that your collision lies between two time frames and you never detect it. But this is solvable as well.

    Does this help?

  3. Nocturn says:

    Well, let me explain myself better. As I understood you used your window coordinates of the cursor to make the ray. Like this:

    pos = cameraPos + view + h*x + v*y

    and this:

    dir = pos - cameraPos

    In my case I don't have that because I'm not using the mouse to detect the collision, I have to use the player position (in world coordinates) to do so.

    How this case, changes the procedure you made? Do the x and y of the player object work the same as the X and Y of the cursor?

  4. schabby says:

    Hi Noctum!

    Well, if you want to cast the picking ray from an object in your scene that is different from the camera, you can indeed apply the same principle. However in my case, the problem I am solving is the mapping from a 2D screen space (mouse coordinates) into a 3D world space. In your case, you already are in world space, so that I do not fully understand why you want to burden yourself with this approach for collision detection. :) You dont necessarily need ray casting for collision detection, except your are trying to simulate a bullet shot and want to determine where the bullet hits a target.

    Anywasys, in world space, the same principle may be applied as follows:
    Let's say you want to cast the ray from an object at position O (being a 3D-vector) into direction I (again being a 3D-Vector). The ray is then described as a line of the form O + I*v, where v is the scalar to scale the length of the line. In addition, let us assume you have several objects K1, K2, K3, etc. as 3D mid-points in your scene that you want to detect a collision of the ray with. For simplicity, we assume each object with mid-point K1, K2, K3, etc. to be a sphere with radius r. All you need to do is then to solve the following equation (in the case for K1):

    K1 - (O + I*v) < = r Since all parameters an known except for v and we are in 3D, you solve for v (analytically) and get an equation for v. If there is a v that satisfies the equation, you have a collision of the line with object K1. You need to compute the v for all K1, K2, K3, etc. respectevly, of course. Let me know if this helps :)

    Johannes

  5. Nocturn says:

    Thank you that was really helpfull!

    Keep the good work with the blog!

    Nestor

  6. Jeroen says:

    Hi there,

    I really like this explanation of ray picking and would like to see the source code you produced. Can you send it to jeroen at delgine dot com?

    Thanks in advance. And thumbs up - you're doing a great job with this blog!

    Cheers,
    Jeroe

  7. schabby says:

    Hi Jeroe, sorry for the late reply!! I am happy to send you the code. It will actually only be one class called Camera in which I provide the routines for billboarding, picking, etc. Please let me know if you have questions.

    Cheers,

    Johannes

  8. Dinu says:

    Great blog! Realy good explanation. Please share the source code.

  9. Yoda says:

    Great explanation, could you share the source code please?

  10. Yoda says:

    or send me to email please?

  11. schabby says:

    hi Yoda!

    I will send it to you right away! Sorry for the delay!

    Regards, Johannes

  12. enkor says:

    hi,
    can you please share the source code to me ?

    thx for great tutorial.

  13. And Cos says:

    Hi,
    Thank you for your explanation of ray picking, it is very well made and in detail, it helped me understand the way 3D picking is made, but i have a question, isn't V vector the same as the camera up vector?

  14. schabby says:

    Hi, yes!! V points in the same direction like the camera-up vector. It gets scaled to a different length though (not normalized to length 1). You could indeed save the cross product part for V and simply copy the camera-up vector and later rescale the copy.

    Good point, thanks

    Johannes

  15. And Cos says:

    Thank you, once again.
    Sorry for my misunderstanding, i was mislead by the fact that in opengles you specify an up direction, not an up vector as i thought,so there is no way of getting the up vector of the camera without calculating it the way you did it above.

  16. And Cos says:

    can i receive the source code please?

  17. Snark says:

    Hello,
    Thx for tutorial, and can i get the source code too?

  18. peon says:

    Hey,
    Great explanation, can I get the source code please. send it to andreea dot zor at gmail dot com

  19. denny says:

    HI! Great tutorial! Can I getthe entire source code please? thx

  20. gabriele says:

    hi,
    can you send me the source code?
    the mail is:
    g dot salvati at gmail dot com

  21. D Shimmies says:

    Hey,
    Fantastic tutorial. I was wondering if I could get the source code.

  22. Travis says:

    Could I get the source code please? if you could send it to twfolsom11@charter.net. thanks

  23. Chris says:

    Thanks for the tutorial but I just wanted to add a correction. The final formula for pos you have written is "pos = cameraPos + view + h*x + v*y", but this should be pos = cameraPos + nearClippingPlaneDistance*view + h*x + v*y. This new formula should correctly translate the point onto the view plane.

  24. D J Wray says:

    Great article, but I have a few questions :)
    I have an image created with 3ds Max. Near clipping plane is 0. Far clipping plane is 1000. If I follow your instructions vLength becomes 0 and hLength becomes 0. Is that a problem? Also, how do you "scale v by vLength" and and how do you perform "cameraPos + view + h*x + v*y". Sorry I'm relatively new to vectors and matrices. Any help by anyone much appreciated !

  25. D J Wray says:

    Hi Johannes,

    I discovered that my near clipping plane distance is actually 1 but I still can't produce a meaningful result. The values in v, h and view (all normalized) are tiny compared to my camera position coordinates (0,120,390), so cameraPos + view + h*x + v*y is virtually the same as cameraPos. The x and y values are also tiny because they have been scaled down to between -1 and 1. When you say "scale v by vLength" do you mean multiply each element of v by vLength? Obviously I'm still doing something wrong. If you haven't got time to explain maybe you could be kind enough to send me the source code.

    Regards, Darren.

  26. D J Wray says:

    Got it working. Yippee!

  27. schabby says:

    Hehe, awesome! :)

  28. D J Wray says:

    //Intersect ray with plane Z=0 (playing surface)
    //The equation of a plane is Ax + By + Cz + D = 0, represented here as [A, B, C, D]
    var P=[0, 0, 1, 0];
    //Find the unit normal of the plane
    var mag=Math.sqrt(P[0]*P[0]+P[1]*P[1]+P[2]*P[2]);
    var N=[P[0]/mag, P[1]/mag, P[2]/mag];
    //Get ray length
    var t=-(DotProduct(N, cameraPos) + P[3]) / DotProduct(N, dir);
    var intersect=[ cameraPos[0]+t*dir[0], cameraPos[1]+t*dir[1], cameraPos[2]+t*dir[2] ];
    //Test the result by drawing an image on the screen
    DrawImage(myImage,intersect[0],intersect[1],intersect[2]);

    Regards, Darren
    footydomain.com

  29. far says:

    could you please send me the source code ?
    thaks

  30. schabby says:

    Hi guys, I updated my post, now providing the source code!! Enjoy,

    Johannes

  31. alou says:

    Hi schabby,

    Thanks for the source code. I am trying to use your approach to pick single vertex. My application contains only one object which is a face.
    But my problem is how to find out the closest vertex to the ray.
    Any ideas??

    Cheers

  32. srinivas says:

    Hi schabby,
    Thanks For sharing source code.i m trying to use ur Mecanism in order to implement picking .
    In my application haveing several traingles Which was drawn on the screen Using glDrawElements with mode GL_TRAINGLE_STRIP.
    Now I have Give provide to pick The Particular point Information from mouse intersection. for that we need to implement ray tracing . My Problem Is When I m using Perspective Projection how i need to generate ray such way that i can find my triangle information.

    Any Ideas????

    Cheers.

  33. schabby says:

    Hi,

    well it is actually pretty much the same approach as outlined in my post. You need to set up the ray and test for intersections with each triangle in world space. Triangle intersection can be computed in closed form, meaning that there is a formula that spits out true/false for a given triangle and line. You will have to google the formula though :)

    Hope this helps,

    Johannes

  34. pardoman says:

    Thanks for the code provided, it was super helpful.

    A problem we found was that there is a problem when Near is different to "1".
    The line that reads: pos = cameraPos + view + h*x + v*y
    Should be: pos = cameraPos + (view * Near) + h*x + v*y

    Thanks!

  35. Laykun says:

    Post 23 by Chris completely fixed the issue I was having, you should definitely alter your source and pseudo code. I believe pardoman also has the same solution.

  36. schabby says:

    Hi,

    thanks for your comment. I will update my post accordingly!

    btw, I am revising my tutorials at the moment to bring them up to date with OpenGL 4.0. Picking/Selection will work slightly different then, because one can derive most of the stuff right of the view matrix.

    Cheers,

    Johannes

  37. Ray says:

    Good afternoon.

    I've tried to understand how your provided source code works, but no luck :(

    pickingRay.getDirection().sub(position);
    Since provided vector class does not have this method, i understand it should be same as

    pickingRay.getDirection().sub(pickingRay.getDirection(), position); ?

    Also, how does exactly "intersection" checking work? Do i just feed empty float array to intersectionWithXyPlane and then check numbers for something?

    Is there a possibility for you to post full source code?

    Cheers,
    Ray

  38. zz says:

    Great stuff! Thank you.

  39. Virus says:

    Hi. I didn't find the source code,
    please send me it

    Cheers,
    Astemir

  40. schabby says:

    Hi Gabriele, the source code is "in-line" in the post, no download :) Best, J

  41. Pankaj says:

    It's nice tutorial,
    Can you please provide source code for same to me.
    Thanks

  42. Alex Lopez says:

    Hi could you send me the source code? Im still a little confused

  43. YOM says:

    Can you please provide source code for same to me.

  44. John Connors says:

    Only one thing - in OpenGL, the Y-axis runs from the bottom to the top of NDC coordinate space, and from the top to the bottom of screen space - you have to invert Y to make this work correctly for OpenGL

  45. Tjoma says:

    Hey,
    nice tutorial it help me alot to understand it. But i dont understand how intersectionWithXyPlane(float[]) works.
    Were i call this function? maybe it is possiblie tell how i can bind it into an mouseclick event. what function i need to call to get the cordinates in opengl.

  46. Alex says:

    I try to implement ray picking for Ortho projection.

    How do I compute the lengths of h and v in the view port. The view angle is for orthonormal projection 0.

    So the equation vLength = tan( rad / 2 ) * nearClippingPlaneDistance can not be used anymore...

  47. K Hamdou says:

    What a great work!!!!!
    well done, I am intrigued to see the code. could you please send it to me. baamiis7 at gmail dot com.

    Khalid

  48. Catalin says:

    double g_center[3] = {0, 0, 0};
    double g_eye [3] = {-5, 5, 5};
    double g_up [3] = {0, 1, 0};

    glClear (GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

    // clear the matrix
    glLoadIdentity();

    // viewing transformation
    gluLookAt (
    g_eye [0], g_eye [1], g_eye [2],
    g_center [0], g_center[1], g_center[2],
    g_up [0], g_up [1], g_up [2]);

    ...

    double vectorView [3];
    double vectorV [3];
    double vectorH [3];

    //view = cameraLookAt - cameraPosition // 3D float vector
    //normalize view
    Substraction3_DBL (vectorView, g_center, g_eye);
    Normalize3_DBL (vectorView);

    //h = crossProduct( view, cameraUp ) // 3D float vector
    //normalize h
    CrossProduct3_DBL (vectorH, vectorView, g_up);
    Normalize3_DBL (vectorH);

    //v = crossProduct( h, view) // 3D float vector
    //normalize v
    //Note that v is actually pointing in the same direction like the camera-up vector,
    // but we compute it via the cross product to illustrate the method.
    CrossProduct3_DBL (vectorV, vectorH, vectorView);
    Normalize3_DBL (vectorV);

    unfortunately i can go any further because
    vectorV = {0.40824829046386307, 0.81649658092772615, -0.40824829046386307}
    as u can see is far from g_up [3] = {0, 1, 0};

  49. Legend says:

    Well done.

    But I don't know how to apply: "intersection" checking? intersectionWithXyPlane()?

    Please help

    Thanks

  50. Bropocalypse says:

    There's something I don't understand about the pseudocode.

    view = cameraLookAt - cameraPosition // 3D float vector
    normalize view

    if gluLookAt doesn't return a value, how can this be used this way?

Post Comment

Please notice: Comments are moderated by an Admin.