# How to estimate 2 points on the unit circle

1. May 17, 2014

### weetabixharry

I have some noisy data (x-y coordinates) containing two distinct clusters of data. Each cluster is centred at an unknown point on the unit circle. How can I estimate these two points (green points in the diagram)?
We can assume the noise is Gaussian and noise power is equal in x and y directions.

Computational power is extremely limited, so I'm probably looking for a non-iterative approach involving as few 'complicated' operations (e.g. square roots) as possible.

Last edited: May 17, 2014
2. May 17, 2014

### Simon Bridge

Assume gaussian noise and find the average.

If there are only two points, then you have two possible unit circles - or do you also know that the circle centered on (0,0) is the correct one?

If you must have a point on the unit circle, then, having found the radial distribution function (i.e. parameter fitting off the data), you can plot a probability distribution for points on the circle.

3. May 17, 2014

### Staff: Mentor

That is possible, but it needs computing power.

Given the large number of points and their small spread relative to the size of the unit circle, I would first calculate the mean and then look for the point on the circle closest to this mean (this is just a normalization, a single square root, one division and 3 multiplications. And you might be able to use this weird method to speed it up).
If you need an uncertainty estimate for this point, use the one you got for the mean but restrict it to the direction of the circle.

4. May 17, 2014

### Okefenokee

You could do a least squares fit using the projection matrix.

Let's assume that your results should conform to the equation of a circle: x^2 + y^2 = 1.

Take all your x and y data points from one cluster, sqaure them, and create a matrix of the form.

$\left( \begin{array}{cc} (x_{1})^{2} & (y_{1})^2 \\ (x_{2})^{2} & (y_{2})^2 \\ (x_{3})^{2} & (y_{3})^2 \end{array} \right) \left( \begin{array}{c} X^{2} \\ Y^2 \end{array} \right) = \left( \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right)$

Define:

$A = \left( \begin{array}{cc} (x_{1})^{2} & (y_{1})^2 \\ (x_{2})^{2} & (y_{2})^2 \\ (x_{3})^{2} & (y_{3})^2 \end{array} \right)$

$C = \left( \begin{array}{c} 1 \\ 1 \\ 1 \end{array} \right)$

and:

$A\left( \begin{array}{c} X^{2} \\ Y^2 \end{array} \right) = C$

You clearly have enough data points for the matrix A to be overdetermined so we can use the projection matrix formula. Solve for X^2 and Y^2 like so.

$\left( \begin{array}{c} X^{2} \\ Y^2 \end{array} \right) = (A^{T}A)^{-1}A^{T}C$

Now you have a least squares fit for X^2 and Y^2. Take the square root of these to get an x and y value.

This is all pretty easy to do in Matlab.

5. May 17, 2014

### weetabixharry

Yes, to clarify: the circle is definitely centred on (0,0).

As mfb suggests, I have a feeling this would require too much computation. Computational simplicity is a critical requirement.

If I understand correctly, this would give me only one point (located somewhere between the two points I am trying to estimate).

How do I do this? If I could separate the clusters, then I would be happy to simply take the mean (x,y) location of each cluster... then normalise to the unit circle. Perhaps this is similar to what mfb was suggesting. However, I think separating the clusters is a difficult problem in itself.

To clarify: every sample belongs to one cluster or the other, but this assignment is random and unknown.

Last edited: May 17, 2014
6. May 17, 2014

### Staff: Mentor

This was meant for each cluster, you have to separate them first. If your plot is typical then this should be easy, every clustering algorithm can do this (with a small subset of your data to save time). Once you have approximate cluster centers, you can calculate a dividing line in between and sort the points to the clusters.

If you clusters can overlap, this problem gets significantly harder.

7. May 17, 2014

### weetabixharry

Would you be able to recommend a nice simple one? I did a bit of googling and k-means clustering shows up a lot... but it would appear to be an iterative method, which makes me reluctant to use it.

I suppose a good initial step would be to compute the angle of all the data points with respect to the coordinate axes. After all, the solutions I'm looking for are essentially just this one angle parameter. (I could use a look-up table to implement a suitably accurate inverse tangent function). This would then convert the task to a 1D problem of clustering the angles. Presumably there are some super neat ways of clustering in 1D (though not much seems to be showing up on Google).

Last edited: May 17, 2014
8. May 17, 2014

### Okefenokee

This is a quick and dirty way to do it in pseudocode.

Code (Text):
if(arctan(y/x) > 45 degrees){

}else{

}
edit:

You did say that computation was limited so you could use this too:

Code (Text):
if(x > 0.45){

}else{

}

Last edited: May 17, 2014
9. May 17, 2014

### Staff: Mentor

@Okefenokee: How do you know the clusters are like that?

@weetabixharry: Yes, cluster algorithms are iterative, but as soon as your cluster centers are somewhere near the clusters you can stop. Can you be sure your clusters are always well-separated?

10. May 17, 2014

### weetabixharry

I would need to replace this "45 degrees" with an angle that separates the two clusters (we don't know this angle in advance, so would need to estimate it from the data). The mean angle across all the data points may seem like an obvious choice, but there is no guarantee that both clusters will have a similar number of points. Therefore, the mean could be badly biased to one side. A more sophisticated estimate of the midpoint would possibly require sorting the data (or at least some of it), which is likely to be a bit too computational. Hmm...

11. May 17, 2014

### weetabixharry

Unfortunately, I cannot. Ideally, I would like to be able to handle heavily overlapping clusters. It's funny that it's so easy to locate the cluster centres by eye, but identifying a really elegant algorithm for doing it is not straightforward.

12. May 17, 2014

### Staff: Mentor

You could make 32 bins in the angle, for example (those are easy to fill with a lookup table), fill your points in, and then look for clusters. Cluster analysis of 32 ordered values should be fast.

Edit: Okay, then you'll need some better algorithm. Sorting by cluster does not work if your clusters overlap.

Find the regions with data with a few bins in the angle, if that is not sufficient to separate the clusters make finer bins, then fit two gaussians?

13. May 17, 2014

### weetabixharry

I'm working at something along these lines. The fitting of the Gaussians is stumping me slightly at the moment.

Last edited: May 17, 2014
14. May 17, 2014

### Simon Bridge

The amount of computing power was not specified. I was able to do the kinds of analysis I suggested using matlab under win4lin on a work pentium IV... something... box... back in the day.

See "inverse problems" for general methods.
http://home.comcast.net/~szemengtan/ [Broken]
... more than needed here, incl. fitting gaussians.

Last edited by a moderator: May 6, 2017
15. May 18, 2014

### weetabixharry

I'm looking at an embedded application, so something more like a dirt-cheap microcontroller than a big sophisticated Pentium processor.

Thanks - these notes look pretty good. I haven't found anything on Gaussian fitting yet, but I'll keep ploughing through.

Last edited by a moderator: May 6, 2017
16. May 18, 2014

### Staff: Mentor

Even a cheap microcontroller can solve the most difficult problems with enough time. How fast does the calculation have to be?

17. May 18, 2014

### weetabixharry

I can tolerate a fair amount of latency, but the data will ultimately have to be streaming through at full rate. Plus, the more resources I can save on this task, the more I'll have left over to do everything else.

18. May 18, 2014

### Staff: Mentor

What is a "full rate"? I'm not asking about a percent-accuracy. But there is a huge difference between 1 fit per second and 1 million.

19. May 18, 2014

### Simon Bridge

What's wrong with:
Take the mean of the x positions and the mean of the y positions to get a point (a,b) [for each sample set]
The point on the unit circle closest to (a,b) would be:
x=(a/|a|)a/(a^2+b^2), y=(b/|b|)b^2/(a^2+b^2)

...should be pretty fast.

To get faster - change the estimator for the mean and adjust how many data points you collect.

If it is something you want to be able to do for a real-time response, you will want some refining estimation process like a Beyesian analysis.

"cheap microcontroller"s these days have orders of magnitude more processing power than the 8086's I grew up with, and they could do these sorts of analysis in an eyeblink. Have you tried just implementing a solution and timing it?

The inverse problems notes start out talking about "gaussian noise" ... you would describe your data as being a point in the unit circle with gaussian noise about it.

so you have $\vec d = \vec x + \vec n$ where $\vec x$ is just the same point for each data point.

I don't think there is a section called "gaussian fitting" - though there is a section on fitting curves.

But what it boils down to is your project constraints - which you have to be able to quantify.

20. May 19, 2014

### Staff: Mentor

How do you do this with overlapping distributions? In general we don't know the set a point belongs to.

That's the suggestion I gave in post 3.