Inverse perspective mapping equations

Click For Summary

Discussion Overview

The discussion revolves around the challenge of mapping real-world coordinates to pixel coordinates in an image captured by a tilted camera on a robot. Participants explore the theoretical and practical aspects of creating a transformation matrix that relates these coordinates, considering factors such as camera parameters and lens distortion.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant describes their experience with calibrating a camera by marking a grid on the floor and taking a picture to derive the mapping, suggesting this might be a practical approach.
  • Another participant emphasizes the need to account for lens distortion and proposes starting with a perfect lens model before applying corrections for distortion.
  • Concerns are raised about the need to recalibrate if the camera angle changes, questioning the stability of the mapping under different configurations.
  • Participants discuss the necessary parameters for the transformation matrix, including focal length, height, camera angle, and pixel coordinates, while questioning if additional parameters like aperture angle are needed.
  • One participant suggests automating the calibration process to avoid manual input of parameters, highlighting the potential for empirical calibration to account for real-world deviations.
  • There is uncertainty regarding how to derive the transformation matrix with lens effects, with references made to literature that might provide insights into the necessary calculations.

Areas of Agreement / Disagreement

Participants express a range of views on the best approach to derive the transformation matrix, with no consensus reached on a specific method or formula. The discussion remains unresolved regarding the optimal parameters and techniques for mapping real-world coordinates to image coordinates.

Contextual Notes

Participants note limitations related to the camera's low resolution and the potential need for recalibration with changes in camera setup. The discussion also highlights the complexity of accounting for lens distortion and the need for empirical validation of theoretical models.

darie2808
Messages
2
Reaction score
0
Hi . I am making a robot that is supposed to detect a red ball with a camera, and then know where the ball is .
The robot has a camera on top of it, that is tilted at an angle ; At each frame the camera detects the red ball and returns 2 coordinates , xp and yp which are the pixel coordinates of the middle of the ball on the image captured .
What i want is to map each point on the ground to a point in the image. In other words to change the perspective from birds eye view to a 2d top view of the ground in front of the robot.
Everything about the camera is known : focal length , the angle at which is tilted, the height at which it is , the horisontal and vertical apperture angles...everything.
All i need is the formula that relates X,Y of the real world to the xp and yp .
I tried several equations that i found online but although all are similar, they are different and none worked .
Note the (0,0) point on the image is in the middle of it.
Help would be much appreciated!
Best wishes ,
Darie
 
Science news on Phys.org
I've worked with a group that had to solve the same problem. I didn't work on that part of the project but my recollection is that for the most part what we did was to mark off the floor in a grid pattern. We then calibrated the camera by taking a picture of the grid on the floor and numerically working out the mapping. That may be one of the better ways of doing it (particularly if you end up changing cameras or simply adjusting the rig as time goes by). This worked quite well actually because at the time we were making a point of using very cheap web cameras which presented an appreciable amount of distortion.

Now if you wish do this purely from theory, I think the first starting point is coming up with a mapping for a perfect lens. That is, the first thing you need is to account for the visual projection at the focal point of the floor space. This would only depend upon the height of the camera. You would probably then have to derive a projection for the distortion that is caused by the lens (this is could be captured empirically like we did as I described above). My guess is that you would most likely need to accommodate just for simple barrel distortion. Once you come up with these projections you would simply need to apply the inverse projections on the resulting capture image (image * inverse barrel distortion * inverse perspective projection).
 
Thank you for your answer! We might do the same as you did . However our camera has very low resolution (174*144) . Hope it will work . But there is a problem with that, if we change the angle of the camera, all the mapping will have to be done again ... Right? I might have missunderstood you though , since you said that that changing the rig is not a problem .
What are the parameters that we really need ? Focal length , height , camera angle and xp and yp right? Do we need anything else? Like the aperture angle? or something like that.
All the equations we encountered required different parameters.
Again , what I need is the 'transformation matrix ' such that :
(X,Y) = transformation matrix * (xp,yp).
DO you know where I could find this transformation matrix?
 
Well ideally you would just write an automated procedure to calibrate the rig. That way, you can setup your calibration grid on the floor and then just run the calibration procedure any time you wish. With such a program you would not need to input the parameters like focal length, height, etc. Plus one would expect some slight deviation from the ideal when put into practice anyway and thus there should be some kind of empircal calibration in the end.

As for the transformation matrix, no I don't know how to do that with a lens. I know how to do that assuming that you do not have a lens easily enough but I do not remember enough of my perspective and projection lessons to recall how you would accommodate the added refraction of a lens between the picture plane or focal point and the object. The problem is that the ray tracing involves skew lines instead of lines that are parallel to the axis. You can probably find more about this in Born and Wolf's "Principles of Optics" text. I believe they cover the behavior of skew lines in 4.9. But you should be able to find the transformation for basic distortions easily enough.

Either way, I would take a look at Born and Wolf's text or even take a look at some engineering drafting textbooks. Techniques on how to do technical perspective drawings may be more applicable in this case since you want to do a rough ray trace based method.

EDIT: Off-hand here is one way how I would think of the problem but I do not know if it is right. I would just do a projection drawing of the grid given the height and angle of the camera and the distance from the grid. I would do a projection onto a picture plane that is of the focal length away from the focal point and the picture plane being situated in a plane dictated by the angle of the camera. I'm not sure if this is correct but it sounds reasonable to me. This would give you the basic projection results to which the lens aberrations and distortions would be applied to.
 
This was my idea on how it is probably done but I'm not sure.
 

Attachments

  • method.jpg
    method.jpg
    33.1 KB · Views: 787

Similar threads

  • · Replies 21 ·
Replies
21
Views
2K
Replies
3
Views
3K
  • · Replies 2 ·
Replies
2
Views
4K
  • · Replies 14 ·
Replies
14
Views
3K
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 34 ·
2
Replies
34
Views
14K
  • · Replies 6 ·
Replies
6
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 10 ·
Replies
10
Views
4K