# Issue with perspective projection?

## Main Question or Discussion Point

Summary: Perspective projection is often referred to when talking about camera models

I have the following problem. Perspective projection is often referred to when talking about camera models(https://en.wikipedia.org/wiki/3D_projection#Perspective_projection). I don’t think I understand it very well though this concept is taught when I was at junior high or even primary school. I think programmers with a computer vision background may be familiar with it.

I read quite some “tutorials” on perspective projection, also computer vision textbooks like “Computer Vision: Algorithms and Applications”, “Multiple View Geometry in Computer Vision:Second Edition”. But the representation conventions seems to be quite a lot. It’s a little bit unfriendly to beginners. Just to get a feeling, here’s one of them:

May I ask if there’s some good, easy-to-read and self-contained articles that can help beginners like me understand perspective projection? It may explains quite clearly the physical meaning of the parameters of it, especially the “scale factor”.

Any ideas? Thanx in advance.

Delta2

## Answers and Replies

Related Linear and Abstract Algebra News on Phys.org
.Scott
Homework Helper
I ran into the perspective projection early in my career as a Software Engineer and found it to be no problem.
1) translate your space so that the focal point of the camera is at (0,0,0). This is simply subtracting the coordinates of your focal point from all the points that will be projected.
2) rotate you space so that the camera is looking down the Z axis in the -Z direction, and the X and Y axis are as you want them in the projection.
3) Eliminate anything with a non-negative Z (they are in back of the camera.
4) Transform: X=-x/z, Y=-y/z

That's it.

Of course, you may want to project more than just points - such as conic sections. But if you're struggling with that wiki article, start by practicing with with points.

Last edited:
Stephen Tashi
That's it.
From the OP's links, the problem the OP asks about an inverse problem to finding screen coordinates. It has to do with finding the 3-D coordinates of the camera from information about the object being viewed and the object's screen coordinates.

.Scott
Homework Helper
From the OP's links, the problem the OP asks about an inverse problem to finding screen coordinates. It has to do with finding the 3-D coordinates of the camera from information about the object being viewed and the object's screen coordinates.
I have also run into that kind of problem, but the solution depends on the specifics. If the points on an aerial photograph are associated with Lat/Long/altitude values, then the photographic transform can be worked out using simple linear arithmetic. If there are "too many" points, then a least squares best fit can be determined. In practice, I have never gone from the transformation parameters to the actual camera position and orientation, but I don't see any problem in doing that.
The specific of the answer obviously depends on the specifics of the problem. If the OP wants more suggestions, I will keep an eye on this thread for a few days.

.Scott
Homework Helper
As I said, it's very application-specific. I never attempted to solve it "cold" given only three points. I required that the user specify four points - and preferably 5 or 6. That way I could validate the input. It was too easy for an analyst to misidentify a landmark on either the map or the film.
When I say "cold", sometimes there was other information I had that I could use to determine the mapping. Cold was when that other information was not available.