# Geodesics between Images

I was told by someone that for computer vision AI, a photo of say an apple and an orange exists on some high dimensional manifold, and the goal is to learn a geodesic between the two objects.

What does this mean? Does this mean that the photo of one of the images is just a tuple of coordinates? Do the objects need to be in the same photo to be in the same manifold? Or can they be different photos?

Do we connect the geodesic along patches of space? But most of the space is empty since the data is discrete and sparse.

Telemachus

jedishrfu
Mentor
This is a very fuzzy question. Can you provide some reference to look at instead of a someone told me reference?

My closest thought would be a neural net to discriminate between the two objects which mathematically would be represented by a matrix in computer code. However, I can't see how a geodesic is involved in any way.

I found this reference but it's for geodesics on a topological map which makes a lot more sense.

https://www.mathworks.com/matlabcen...esic-distance-between-two-points-on-an-image?

Telemachus
I think the idea has to do with the distribution of data. If the data is on a manifold, then eucildean distance goes through the space that the manifold is embedded in and not on the manifold itself, which is problematic.

The topic is manifold learning.

I think there's an example of distance on a Swiss roll manifold on page 9 of the attachment.

Also, here is an image. The Euclidean distance is misleading on the roll because within the manifold, the distance is much greater. Projecting the points downwards, the blue seems close to the red, but in the non projected higher dimensional space of the manifold, the red geodesic is the correct path. But I just don't know what those points mean. I was thinking that images that are dissimilar have greater geodesics between them.

Also, the image is from this website.
http://www.indiana.edu/~dll/B657/B657_lec_isomap_lle.pdf

#### Attachments

• ThompsonDimensionalityReduction.pdf
787 KB · Views: 127
Last edited:
jedishrfu
Mentor
You could compare it to the stars in a constellation. When you view the stars, any two stars may appear very close especially if they are similar in brightness as you are using an angular measurement to judge the distance. However once you consider the radial distance in 3D space one star may be extremely bright and extremely far away.

In a counter example, two stars may appear very distant from an angular point of view but still be in the same cluster of stars simply because the cluster is closer to us.

Here's an animation using the constellation of Orion:

The starting view is Orion as seen from the Earth and the animation then rotates the constellation around 90 degrees and now you can see the true distances involved.

In data mining, distances are used to group data points together so adding one more dimension to your data might bring together two points that seemed very distant before. In the example, you gave you are moving from the manifold's way of measuring distance to a different mapping and different metric to measure the distance which brings two points a lot closer together.

The best analogy I can think of is comparing customers using income, one making $100K/yr and another making$50K/yr. By that measurement alone they seem very far away.

However if we now consider cost of living, the $100K person might live in NYC where a lot of his/her income is needed to survive whereas the$50K/yr person living in a smaller city may not have those expenses at least to that degree and so their incomes when adjusted might be $30K (from$10K) and $20K (from$50K) and now you see their true buying power and its much closer.

Last edited:
FallenApple and Telemachus
Ibix
2020 Award
I think the idea is that if you have a set of images that are connected locally (like the sequence of images of the golfer, where you know the order of the images but don't necessarily know how many frames lie between two images in general) and can extract some parameters from them (e.g. the angle of the club and the rotation of the upper body) then you have a network in the parameter space. If it's reasonably friendly network (the points that are directly connected to any given one are close to it, things like that) then it is reasonable to approximate the points as lying in a smooth manifold embedded in the parameter space. Then you can estimate the distance between any two pictures in terms of distances along the manifold instead of finding your way through the network.

You can just use Euclidean distance in the parameter space, but that fails for cases like the golf swing where "lining up the shot" and "club hits the ball" are quite similar, but separated in truth by a great big loop of pictures of backswing and drive. The manifold in this case is the line through the parameters extracted from each picture in sequence.

That's how I read those notes, anyway. I'm not an expert or anything.

FallenApple, Telemachus and jedishrfu
jedishrfu
Mentor
I think @Ibix got it right here. I hadn't considered the notion of comparing successive frames in a video as points in a manifold but that makes a lot of sense.

Telemachus