How to estimate 2 points on the unit circle

In summary: The angle of all the data points with respect to the unit circle.In summary, mfb is looking for a method to estimate two points on a unit circle with limited computational power. He suggests assuming Gaussian noise and finding the average. Next, he calculates the angle of all the data points and uses this information to find the dividing line between the clusters. Once this line is found, he can sort the points to the clusters.
  • #1
weetabixharry
111
0
I have some noisy data (x-y coordinates) containing two distinct clusters of data. Each cluster is centred at an unknown point on the unit circle. How can I estimate these two points (green points in the diagram)?
pair_of_points.png

We can assume the noise is Gaussian and noise power is equal in x and y directions.

Computational power is extremely limited, so I'm probably looking for a non-iterative approach involving as few 'complicated' operations (e.g. square roots) as possible.
 
Last edited:
Physics news on Phys.org
  • #2
Assume gaussian noise and find the average.

If there are only two points, then you have two possible unit circles - or do you also know that the circle centered on (0,0) is the correct one?

If you must have a point on the unit circle, then, having found the radial distribution function (i.e. parameter fitting off the data), you can plot a probability distribution for points on the circle.
 
  • #3
Simon Bridge said:
If you must have a point on the unit circle, then, having found the radial distribution function (i.e. parameter fitting off the data), you can plot a probability distribution for points on the circle.
That is possible, but it needs computing power.

Given the large number of points and their small spread relative to the size of the unit circle, I would first calculate the mean and then look for the point on the circle closest to this mean (this is just a normalization, a single square root, one division and 3 multiplications. And you might be able to use this weird method to speed it up).
If you need an uncertainty estimate for this point, use the one you got for the mean but restrict it to the direction of the circle.
 
  • #4
You could do a least squares fit using the projection matrix.

Let's assume that your results should conform to the equation of a circle: x^2 + y^2 = 1.

Take all your x and y data points from one cluster, sqaure them, and create a matrix of the form.

[itex]\left( \begin{array}{cc}
(x_{1})^{2} & (y_{1})^2 \\
(x_{2})^{2} & (y_{2})^2 \\
(x_{3})^{2} & (y_{3})^2 \end{array} \right)
\left( \begin{array}{c}
X^{2} \\ Y^2 \end{array} \right) = \left( \begin{array}{c}
1 \\
1 \\
1 \end{array} \right)

[/itex]

Define:

[itex]A = \left( \begin{array}{cc}
(x_{1})^{2} & (y_{1})^2 \\
(x_{2})^{2} & (y_{2})^2 \\
(x_{3})^{2} & (y_{3})^2 \end{array} \right)
[/itex]

[itex]C = \left( \begin{array}{c}
1 \\
1 \\
1 \end{array} \right)
[/itex]

and:

[itex]A\left( \begin{array}{c}
X^{2} \\ Y^2 \end{array} \right) = C
[/itex]


You clearly have enough data points for the matrix A to be overdetermined so we can use the projection matrix formula. Solve for X^2 and Y^2 like so.

[itex]
\left( \begin{array}{c}
X^{2} \\ Y^2 \end{array} \right) = (A^{T}A)^{-1}A^{T}C

[/itex]

Now you have a least squares fit for X^2 and Y^2. Take the square root of these to get an x and y value.

This is all pretty easy to do in Matlab.
 
  • #5
Simon Bridge said:
do you also know that the circle centered on (0,0) is the correct one?
Yes, to clarify: the circle is definitely centred on (0,0).

Simon Bridge said:
If you must have a point on the unit circle, then, having found the radial distribution function (i.e. parameter fitting off the data), you can plot a probability distribution for points on the circle.
As mfb suggests, I have a feeling this would require too much computation. Computational simplicity is a critical requirement.

mfb said:
Given the large number of points and their small spread relative to the size of the unit circle, I would first calculate the mean and then look for the point on the circle closest to this mean
If I understand correctly, this would give me only one point (located somewhere between the two points I am trying to estimate).

Okefenokee said:
Take all your x and y data points from one cluster
How do I do this? If I could separate the clusters, then I would be happy to simply take the mean (x,y) location of each cluster... then normalise to the unit circle. Perhaps this is similar to what mfb was suggesting. However, I think separating the clusters is a difficult problem in itself.

To clarify: every sample belongs to one cluster or the other, but this assignment is random and unknown.
 
Last edited:
  • #6
weetabixharry said:
If I understand correctly, this would give me only one point (located somewhere between the two points I am trying to estimate).
This was meant for each cluster, you have to separate them first. If your plot is typical then this should be easy, every clustering algorithm can do this (with a small subset of your data to save time). Once you have approximate cluster centers, you can calculate a dividing line in between and sort the points to the clusters.

If you clusters can overlap, this problem gets significantly harder.
 
  • #7
mfb said:
every clustering algorithm can do this
Would you be able to recommend a nice simple one? I did a bit of googling and k-means clustering shows up a lot... but it would appear to be an iterative method, which makes me reluctant to use it.

I suppose a good initial step would be to compute the angle of all the data points with respect to the coordinate axes. After all, the solutions I'm looking for are essentially just this one angle parameter. (I could use a look-up table to implement a suitably accurate inverse tangent function). This would then convert the task to a 1D problem of clustering the angles. Presumably there are some super neat ways of clustering in 1D (though not much seems to be showing up on Google).
 
Last edited:
  • #8
weetabixharry said:
How do I do this? If I could separate the clusters, then I would be happy to simply take the mean (x,y) location of each cluster... then normalise to the unit circle. Perhaps this is similar to what mfb was suggesting. However, I think separating the clusters is a difficult problem in itself.

To clarify: every sample belongs to one cluster or the other, but this assignment is random and unknown.

This is a quick and dirty way to do it in pseudocode.

Code:
if(arctan(y/x) > 45 degrees){

add_to_data_cluster_1(x,y);

}else{

add_to_data_cluster_2(x,y);

}

edit:

You did say that computation was limited so you could use this too:

Code:
if(x > 0.45){

add_to_data_cluster_1(x,y);

}else{

add_to_data_cluster_2(x,y);

}
 
Last edited:
  • #9
@Okefenokee: How do you know the clusters are like that?

@weetabixharry: Yes, cluster algorithms are iterative, but as soon as your cluster centers are somewhere near the clusters you can stop. Can you be sure your clusters are always well-separated?
 
  • #10
Okefenokee said:
Code:
if(arctan(y/x) > 45 degrees){
I would need to replace this "45 degrees" with an angle that separates the two clusters (we don't know this angle in advance, so would need to estimate it from the data). The mean angle across all the data points may seem like an obvious choice, but there is no guarantee that both clusters will have a similar number of points. Therefore, the mean could be badly biased to one side. A more sophisticated estimate of the midpoint would possibly require sorting the data (or at least some of it), which is likely to be a bit too computational. Hmm...
 
  • #11
mfb said:
Can you be sure your clusters are always well-separated?
Unfortunately, I cannot. Ideally, I would like to be able to handle heavily overlapping clusters. It's funny that it's so easy to locate the cluster centres by eye, but identifying a really elegant algorithm for doing it is not straightforward.
 
  • #12
Well, it would help to know more about your data.
You could make 32 bins in the angle, for example (those are easy to fill with a lookup table), fill your points in, and then look for clusters. Cluster analysis of 32 ordered values should be fast.

Edit: Okay, then you'll need some better algorithm. Sorting by cluster does not work if your clusters overlap.

Find the regions with data with a few bins in the angle, if that is not sufficient to separate the clusters make finer bins, then fit two gaussians?
 
  • #13
mfb said:
Find the regions with data with a few bins in the angle, if that is not sufficient to separate the clusters make finer bins, then fit two gaussians?
I'm working at something along these lines. The fitting of the Gaussians is stumping me slightly at the moment.
 
Last edited:
  • #14
The amount of computing power was not specified. I was able to do the kinds of analysis I suggested using MATLAB under win4lin on a work pentium IV... something... box... back in the day.

See "inverse problems" for general methods.
http://home.comcast.net/~szemengtan/
... more than needed here, incl. fitting gaussians.
 
Last edited by a moderator:
  • #15
Simon Bridge said:
The amount of computing power was not specified. I was able to do the kinds of analysis I suggested using MATLAB under win4lin on a work pentium IV... something... box... back in the day.
I'm looking at an embedded application, so something more like a dirt-cheap microcontroller than a big sophisticated Pentium processor.

Simon Bridge said:
See "inverse problems" for general methods.
http://home.comcast.net/~szemengtan/
... more than needed here, incl. fitting gaussians.
Thanks - these notes look pretty good. I haven't found anything on Gaussian fitting yet, but I'll keep ploughing through.
 
Last edited by a moderator:
  • #16
weetabixharry said:
I'm looking at an embedded application, so something more like a dirt-cheap microcontroller than a big sophisticated Pentium processor.
Even a cheap microcontroller can solve the most difficult problems with enough time. How fast does the calculation have to be?
 
  • #17
mfb said:
Even a cheap microcontroller can solve the most difficult problems with enough time. How fast does the calculation have to be?
I can tolerate a fair amount of latency, but the data will ultimately have to be streaming through at full rate. Plus, the more resources I can save on this task, the more I'll have left over to do everything else.
 
  • #18
What is a "full rate"? I'm not asking about a percent-accuracy. But there is a huge difference between 1 fit per second and 1 million.
 
  • #19
What's wrong with:
Take the mean of the x positions and the mean of the y positions to get a point (a,b) [for each sample set]
The point on the unit circle closest to (a,b) would be:
x=(a/|a|)a/(a^2+b^2), y=(b/|b|)b^2/(a^2+b^2)

...should be pretty fast.

To get faster - change the estimator for the mean and adjust how many data points you collect.

If it is something you want to be able to do for a real-time response, you will want some refining estimation process like a Beyesian analysis.

We don't really know enough about the problem constraints to help you more.

"cheap microcontroller"s these days have orders of magnitude more processing power than the 8086's I grew up with, and they could do these sorts of analysis in an eyeblink. Have you tried just implementing a solution and timing it?

The inverse problems notes start out talking about "gaussian noise" ... you would describe your data as being a point in the unit circle with gaussian noise about it.

so you have ##\vec d = \vec x + \vec n## where ##\vec x## is just the same point for each data point.

I don't think there is a section called "gaussian fitting" - though there is a section on fitting curves.

But what it boils down to is your project constraints - which you have to be able to quantify.
 
  • #20
Simon Bridge said:
[for each sample set]
How do you do this with overlapping distributions? In general we don't know the set a point belongs to.

The point on the unit circle closest to (a,b) would be:
x=(a/|a|)a/(a^2+b^2), y=(b/|b|)b^2/(a^2+b^2)

...should be pretty fast.
That's the suggestion I gave in post 3.
 
  • #21
Is this a general situation though? The example given does not show overlapping samples.
If overlapping samples cannot be ruled out, then there are additional problems yes.
Have we been told how the data is collected and I missed it?

Sure, if all you have is a bunch of undifferentiated numbers coming in then you need some way to sort them to use the method. That's going to cost you no matter what. There is also a problem if the distributions are very non-gaussian... or if any of the underlying assumptions are invalid.

I was hoping for some feedback from OP about why this is not a good idea before suggesting an alternative.
Basically, speeding up the calculation will involve making assumptions about

For the record: I realize and here acknowledge that you have previously suggested that method of getting a point on the circle from the mean of the sample distribution.
 
  • #22
Simon Bridge said:
I was hoping for some feedback from OP about why this is not a good idea before suggesting an alternative.
As I tried to explain, I consider proper segmentation of the points to be a difficult problem in its own right. Once the clusters are differentiated, locating their respective centres is straightforward, so segmenting would certainly be one route to a solution. On the other hand, it is conceivable that segmenting is (in some sense) fundamentally more difficult than simply locating the cluster centres, in which case it would not be the best route to a solution.

Still, thank you for putting so much emphasis on the segmentation idea, as it forced me to think a lot about it. As such, I've come up with two solutions that are fast and accurate: one based on segmentation, the other based on direct estimation without segmentation. I'll do a proper analysis of their performance and complexity and report back here with my findings.
 
  • #23
Excellent. Looking forward to see what you come up with.

I think there's a bit in those lecture notes about how to tell if a graph has one peak or several ...

If you know the center of the circle is on the origin, then you may get a quick estimator by converting your data to polar coordinates. A dot-plot vs angle should give a useful distribution to work with.
Binning the data can give you frequency vs angle - the peaks of which will be the points you want.
Though this approach would require more data points.

If you have reason to believe exactly two points are giving rise to the data, that can help.

If you collect sample data-sets you should be able to work out how well differentiated they are, which can lead you to useful assumptions to shortcut the process. The more you know about the data distribution beforehand the better.
 
  • #24
FWIW - in many cases there are fast workarounds for extensive floating point sqrt() operations . When Paul Hsieh was at Sun he put up a square root page, and kept it up for years. Programmers still refer to it:

http://www.azillionmonkeys.com/qed/sqroot.html

It may help with your coding.
 

1. How do you find the coordinates of a point on the unit circle?

To find the coordinates of a point on the unit circle, we use the trigonometric ratios sine and cosine. The x-coordinate is equal to the cosine of the angle in radians, and the y-coordinate is equal to the sine of the angle.

2. What is the significance of the unit circle in mathematics?

The unit circle is significant in mathematics because it represents the relationship between angles and their corresponding trigonometric functions. It is used to solve problems involving triangles and is a fundamental concept in trigonometry.

3. How do you estimate the coordinates of a point on the unit circle?

To estimate the coordinates of a point on the unit circle, we can use the Pythagorean theorem to find the length of the hypotenuse of a right triangle with one side equal to 1 (since it is a unit circle). Then, we can use the trigonometric ratios to find the coordinates based on the given angle.

4. Can you estimate points on the unit circle without using trigonometric functions?

Yes, it is possible to estimate points on the unit circle without using trigonometric functions. One method is to use the geometric definition of the unit circle, which states that the circle has a radius of 1 and is centered at the origin. Another method is to use the Cartesian coordinates of points on the circle, which can be found using the Pythagorean theorem.

5. How can the unit circle be used to solve real-world problems?

The unit circle can be used to solve real-world problems involving angles and distances. For example, it can be used in navigation to determine the direction and distance between two points. It is also used in engineering and physics to analyze the motion of objects and the forces acting on them.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
8
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
2
Views
975
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
906
Replies
8
Views
923
  • New Member Introductions
Replies
1
Views
72
Replies
8
Views
1K
Replies
5
Views
2K
Replies
1
Views
772
  • General Math
Replies
2
Views
1K
Back
Top