What is the concept of geodesics between images in computer vision AI?

  • Thread starter Thread starter FallenApple
  • Start date Start date
  • Tags Tags
    Geodesics Images
Click For Summary

Discussion Overview

The discussion centers on the concept of geodesics in the context of computer vision AI, particularly regarding how images of objects, such as an apple and an orange, can be represented within a high-dimensional manifold. Participants explore the implications of this representation for understanding distances and relationships between images, as well as the challenges posed by discrete and sparse data.

Discussion Character

  • Exploratory
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant questions whether images exist as tuples of coordinates on a manifold and whether objects must be in the same photo to belong to the same manifold.
  • Another participant suggests that a neural network could discriminate between objects, but expresses skepticism about the relevance of geodesics in this context.
  • A different viewpoint emphasizes the importance of understanding data distribution on a manifold, noting that Euclidean distances may not accurately reflect distances on the manifold itself.
  • One participant uses an analogy of stars in a constellation to illustrate how perceived distances can differ from true distances when considering additional dimensions.
  • Another participant proposes that sequences of images can be treated as points in a manifold, where distances can be estimated along the manifold rather than through the parameter space directly.
  • There is acknowledgment from another participant that the idea of comparing successive frames in a video as points in a manifold is a valid perspective.

Areas of Agreement / Disagreement

Participants express a range of views on the concept of geodesics in relation to images and manifolds, with no clear consensus reached. Some participants agree on the relevance of manifold learning, while others challenge the connection to geodesics.

Contextual Notes

Participants note limitations related to the discrete and sparse nature of data, as well as the potential for misleading interpretations of distances when using Euclidean metrics in manifold contexts.

FallenApple
Messages
564
Reaction score
61
I was told by someone that for computer vision AI, a photo of say an apple and an orange exists on some high dimensional manifold, and the goal is to learn a geodesic between the two objects.

What does this mean? Does this mean that the photo of one of the images is just a tuple of coordinates? Do the objects need to be in the same photo to be in the same manifold? Or can they be different photos?

Do we connect the geodesic along patches of space? But most of the space is empty since the data is discrete and sparse.
 
  • Like
Likes   Reactions: Telemachus
Technology news on Phys.org
This is a very fuzzy question. Can you provide some reference to look at instead of a someone told me reference?

My closest thought would be a neural net to discriminate between the two objects which mathematically would be represented by a matrix in computer code. However, I can't see how a geodesic is involved in any way.

I found this reference but it's for geodesics on a topological map which makes a lot more sense.

https://www.mathworks.com/matlabcen...esic-distance-between-two-points-on-an-image?
 
  • Like
Likes   Reactions: Telemachus
I think the idea has to do with the distribution of data. If the data is on a manifold, then eucildean distance goes through the space that the manifold is embedded in and not on the manifold itself, which is problematic.

The topic is manifold learning.

I think there's an example of distance on a Swiss roll manifold on page 9 of the attachment.

Also, here is an image. The Euclidean distance is misleading on the roll because within the manifold, the distance is much greater. Projecting the points downwards, the blue seems close to the red, but in the non projected higher dimensional space of the manifold, the red geodesic is the correct path. But I just don't know what those points mean. I was thinking that images that are dissimilar have greater geodesics between them.

SwissRoll.png


Also, the image is from this website.
http://www.indiana.edu/~dll/B657/B657_lec_isomap_lle.pdf
 

Attachments

Last edited:
You could compare it to the stars in a constellation. When you view the stars, any two stars may appear very close especially if they are similar in brightness as you are using an angular measurement to judge the distance. However once you consider the radial distance in 3D space one star may be extremely bright and extremely far away.

In a counter example, two stars may appear very distant from an angular point of view but still be in the same cluster of stars simply because the cluster is closer to us.

Here's an animation using the constellation of Orion:



The starting view is Orion as seen from the Earth and the animation then rotates the constellation around 90 degrees and now you can see the true distances involved.

In data mining, distances are used to group data points together so adding one more dimension to your data might bring together two points that seemed very distant before. In the example, you gave you are moving from the manifold's way of measuring distance to a different mapping and different metric to measure the distance which brings two points a lot closer together.

The best analogy I can think of is comparing customers using income, one making $100K/yr and another making $50K/yr. By that measurement alone they seem very far away.

However if we now consider cost of living, the $100K person might live in NYC where a lot of his/her income is needed to survive whereas the $50K/yr person living in a smaller city may not have those expenses at least to that degree and so their incomes when adjusted might be $30K (from $10K) and $20K (from $50K) and now you see their true buying power and its much closer.
 
Last edited:
  • Like
Likes   Reactions: FallenApple and Telemachus
I think the idea is that if you have a set of images that are connected locally (like the sequence of images of the golfer, where you know the order of the images but don't necessarily know how many frames lie between two images in general) and can extract some parameters from them (e.g. the angle of the club and the rotation of the upper body) then you have a network in the parameter space. If it's reasonably friendly network (the points that are directly connected to any given one are close to it, things like that) then it is reasonable to approximate the points as lying in a smooth manifold embedded in the parameter space. Then you can estimate the distance between any two pictures in terms of distances along the manifold instead of finding your way through the network.

You can just use Euclidean distance in the parameter space, but that fails for cases like the golf swing where "lining up the shot" and "club hits the ball" are quite similar, but separated in truth by a great big loop of pictures of backswing and drive. The manifold in this case is the line through the parameters extracted from each picture in sequence.

That's how I read those notes, anyway. I'm not an expert or anything.
 
  • Like
Likes   Reactions: FallenApple, Telemachus and jedishrfu
I think @Ibix got it right here. I hadn't considered the notion of comparing successive frames in a video as points in a manifold but that makes a lot of sense.
 
  • Like
Likes   Reactions: Telemachus

Similar threads

Replies
5
Views
2K
Replies
8
Views
6K
Replies
5
Views
2K
  • · Replies 6 ·
Replies
6
Views
4K
Replies
10
Views
5K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 8 ·
Replies
8
Views
4K
Replies
29
Views
3K
  • · Replies 43 ·
2
Replies
43
Views
12K
  • · Replies 12 ·
Replies
12
Views
3K