Optics: Working out distance of an object from single digital camera using the captured image

1. Feb 17, 2015

Aliando

Hi, I'm working on a project to determine the distance a pedestrian is away from a single digital camera but having never done optics before I'm struggling to find the right equation to use. I'd really appreciate any help!

The method I am trying to use assumes you know the height of the camera above the ground, the inclination of the camera in relation to the ground (parallel with the ground I assume is easiest) and the pixel density of the digital camera used to capture the image.

Then the pixel height of the pedestrians feet in the captured image should correlate to the distance the pedestrian is away from the camera.

Side View:

POV of Camera:

I have looked at using the thin lens equation where S1 is the distance from the object to the lens, S2 is the distance from the lens to the image and f is the focal length but to no avail. I also have no idea how to go about relating the distance of the pedestrian to the pixel height/distance up captured image.

Thanks in advance! I'm really stuck

2. Feb 17, 2015

DEvens

You will need some details of the lens in the camera. You will need to know the focal distance at least. You might need more details about the optics as well, especially if there is a zoom lens on the camera. These days, even a cheapo digital SLR camera has some kind of zoom lens.

http://en.wikipedia.org/wiki/Zoom_lens

Maybe you can do something such as in an autofocus kind of system.

http://en.wikipedia.org/wiki/Autofocus

3. Feb 17, 2015

Aliando

Ah ok thanks, I'd use a Prime lens because of the fixed focal length with a large depth of field because I have to identify the distances of multiple pedestrians at the same time. I'd know the focal length, field of view (angle of view), pixel density of the CCD, height of the camera above the ground, inclination of the camera above the ground. I'm still stumped on what equation to use to equate distance of the object from the camera to distance from bottom of image to location in the image

4. Feb 17, 2015

Stephen Tashi

You could start with a theoretical model of a camera. Imagine the pixels are a grid on an x-y graph. The positive z-axis points "out of the paper" and away from the scene you are viewing. The position of your eye is at (0,0,1). To find where a location in the world at (x,y,z) is the pixel array, you draw a line from (x,y,z) to (0,0,1) and determine where this line intersects the x-y plane. This will be at (x/z, y/z). To find out what pixel that represents, you have to establish the coordinates of the corners of the pixels on the x-y plane. You can do that if you know the horizontal and vertical field-of-view of the camera and the dimensions of the pixel array. ( In this version of the model, locations in the word you are viewing have negative z-coordinates. If you don't like that, you can change the conventions. To do geometry in terms of vectors, it is convenient to have a 'right handed" coordinate system. If you want the world to have positive z-coordinates, you have change the conventions for the x-y axes - perhaps let x be the vertical and y the horizontal. )

For a real life camera, I think you will need to do some empirical measurements. If a real camera is pointed normal to a sheet of graph paper, the image formed on the pixel array probably isn't a perfect rectangular grid. You might need some empirical formula to convert between the theoretical value (x/z, y/z) and the actual pixel location.

In the theoretical model, if you know the equation of the plane on which the pedestrians walk, you can draw the line from (0,0,1) through (x/z, y/z) and solve for where it hits that plane. From that (x,y,z) location, you can determine whatever distances are of interest.

Estimating precise distances from pixel values may require some sophisticated methods. A uncertainty of plus or minus one pixel may be a big uncertainty at long distances. If you have data that gives pixel location vs time for a pedestrian then you know that distances should progress as people actually walk, not in some jumpy manner. So it would be more accurate to forecast the path of the person as a function of time by some method that fits a reasonable function to the raw data.

Edit: The pixel location of (x,y,z) will be at (x/|z+1|, y/|z+1|) instead of (x/z, y/z)) To get the mathematically simpler formula (x/|z|, y/|z|) you could put the eye at (0,0,0) and put the pixel array on the plane (x,y,-1). To get the right formula, think about similar triangles - hopefully more clearly that I do!

Last edited: Feb 17, 2015