Beating linear LS on polynomials

In summary, the conversation discusses a system that can be described using a polynomial function with unknown coefficients. The goal is to find the coefficients using as few samples as possible. However, due to the nonlinearity of the underlying process, the linear part overshadows the nonlinear parts, making it difficult to accurately estimate the higher order coefficients. The LS method is discussed as a possible solution, but it is noted that it may not always give the best performance. Different approaches and alternative algorithms are suggested, such as using orthogonal polynomials, but practical details are not provided. Overall, the discussion highlights the challenges of finding accurate estimates for the coefficients in this system.
  • #1
divB
87
0
Hi all,

Suppose I have a system which can be described using something like:

[tex]y(t) = a_1 x(t) + a_2 x^2(t) + \dots + a_p x^p(t)[/tex]

I want to find the coefficients using samples from x(t) and y(t) (pairwise taken at same times) using as fewest samples as possible.

Clearly this is linear in parameters and can be recast in a LS problem.However, despite the problem being linear, the underlying process is nonlinear in this results in the problem that the linear part "overshades" the non-linear parts (or lower terms higher terms). For example, using few samples, a1 usually gets a good estimate while the error on the nonlinear parts is much higher.

Is it proven that LS will always give the best performance in such scenario or is there hope that the error on the non-linear (or higher order) coefficients can be improved (at the cost of slightly larger error on the linear (or low order) coefficients)?

I think LS should attend the CRLB (and is therefore optimum), right?

On the other hand, I have some ideas why I think that we can do better but I didn't find a concrete way to do it. I'll wait with this to keep the initial posting short.

Thank you,
divB
 
Last edited:
Physics news on Phys.org
  • #2
You'll have to be more specific on what you mean by 'best performance'.

Regular least squares minimizes the square of the difference between the actual y-measurement and the regressed y-measurement for all the data points. I don't know how you would go about trying to find the regression of individual powers of the independent variable.
 
  • #3
The relative uncertainties depend on the x-values of your samples. A larger spread in x-values will give more sensitivity to higher powers of x.
 
  • #4
Hi all,

thanks so far.
"Best performance" was maybe the wrong term. What I mean is error to the "true" coefficients.
Suppose I create a "golden model" by taking 1M measurements.
When I use then only 50 measurements then, the difference to a1 would be the best and ap would have nearly nothing to do with the value taking with 1M measurements.
The reason is that the linear (lower) terms are dominant and "absorb" most of the measuremnt power.
The question is if I can improve the error of ap by sacrificing a little bit loss in the a1 term for example. Something like a "weigthing factor".

I tried many combinations of something like "Use 25 samples, then fix a1 and then use 25 new samples (together with the 25 old ones) to estimate a2 and a3 (for p=3, e.g.).

@mfb: Are you sure this is true? This sounds counter intuitive to me. A larger spread in the x-values correspond to a larger sampling period, right? This in turn means lower sampling rate which "cuts off" even more information from higher order contribution (because the bandwidth of "x" will grow proportional to "p").

Is there any pointer for alternative estimation algorithms or different LS flavors which could be benificial here?

Thanks
divB
 
  • #5
divB said:
I tried many combinations of something like "Use 25 samples, then fix a1 and then use 25 new samples (together with the 25 old ones) to estimate a2 and a3 (for p=3, e.g.).
That does not help. LS with your total dataset is the best you can do based on that dataset.

@mfb: Are you sure this is true?
Yes.
This sounds counter intuitive to me. A larger spread in the x-values correspond to a larger sampling period, right?
That depends on x(t).
This in turn means lower sampling rate which "cuts off" even more information from higher order contribution (because the bandwidth of "x" will grow proportional to "p").
You are fitting a polynomial, not Fourier coefficients.

Simple example:
If all coefficients are of the order of 1, the value at x=10-4 is ##y(10^{-4})=a_1 10^{-4} + a_2 10^{-8} + a_3 10^{-12} + ... \approx a_1 10^{-4}## - you have no sensitivity to higher factors.
At x=1, all contributions are similar: ##y(1)=a_1 + a_2 + a_3 + ... +a_p##
At x=104, the largest power dominates: ##y(1)=a_1 10^4 + a_2 10^8 + ... +a_p 10^{4p} \approx a_p 10^{4p}##.

It does not matter which magnitude your coefficients have, larger x will always give more sensitivity to higher powers.
 
  • #6
In terms of the mathematics, it doesn't matter how you write your polynomial function. But if you want to do numerical work the form in your OP is very ill-conditioned. If ##p## is fairly large (say 10) the term ##a_px^p(t)## contributes very little to the function except for when ##|x|## is close to its maximum value.

Numerically, it is better to write the function as a sum of orthogonal polynomials, for example Chebyshev polynomials (see http://mathworld.wolfram.com/ChebyshevPolynomialoftheFirstKind.html)

For a given set of data points, it is possible to construct an "optimal" set of polynomials which are exactly orthogonal at those data points. The least squares equations then become diagonal. That idea is useful if you have many sets of data at the same ##x## locations, because you only need to calculate the orthogonal polynomials once and then use them for each set of data values.

The practical details will be in any good book or website on least squares methods.
 
  • #7
That does not help. LS with your total dataset is the best you can do based on that dataset.

I guess this is "general" knowledge but do you have a pointer by any chance? Is it that LS attends CRLB?

Simple example:

Thanks, got it!

Maybe I was oversimplifying it too much: First, it is more than just a polynomial (more like a Wiener system). Second, I do not have direct access to y(t) or x(t) but only to the samples at a specific (high) rate. So I have, let's say, 64k samples, all taken at distance of 9ns and I can only use 100. What I can do of course, is to wisely pick which samples I take. Currently I make the best results if I choose the 100 out of 64k. randomly. However, that's still not perfect and not deterministic.

The coefficients itself are complex which shouldn't make too much difference and their absolute value is within 1-2 orders of magnitude (e.g. 2 to 0.1). The magnitues are usually decreasing from higher to lower order.

Numerically, it is better to write the function as a sum of orthogonal polynomials

Thanks for this pointer. Actually I already tried them. They are definitely improving the condition number and would be very helpful for finite precision. However, with 64 bit floating point, the result is essentially the same for the same amount of measurements.




Maybe I can describe at this point why I think it could work better: Rather than looking at the problem in time domain, this can be done in frequency domain and rather than obtaining the 100 domain samples, we could obtain 100 frequency domain samples (e.g., the DFT of the sample points from above).

Suppose that x has a specific bandwidth B and the time domain measurements are such that the sampling rate is > 2*B*p.

In this case, the linear part (a1 term) only affects the frequency components up to B.
Conversely, the second order part (a2) affects the components within 2B and so on.

So it does not make sense to incorporate the frequency components >B for estimating a1 or >2B for a2 and so on.
However, using the standard LS approach this is done!

Is there any pointer where I could look regarding this?


One more note to the error: The error between x and y is which is minimzed. However, for certain applications it is the non-linear contribution (the spectral content > B) which is important and should be estimated more accurately than the spectral part < B.


I hope you can follow here what I mean.


Thanks!
divB
 

1. What is linear least squares (LS)?

Linear least squares is a statistical method used to find the best-fitting line through a set of data points. It aims to minimize the sum of the squared differences between the actual data points and the predicted values from the line. This method is commonly used in regression analysis to model the relationship between a dependent variable and one or more independent variables.

2. How does linear LS apply to polynomials?

Linear LS can also be used to fit polynomial functions to data points. This involves transforming the original data points to a higher dimensional space, where the polynomial function becomes a linear combination of the variables. The same principle of minimizing the sum of squared differences applies, but now it involves finding the best coefficients for the polynomial function.

3. What are the limitations of linear LS on polynomials?

Linear LS on polynomials assumes that there is a linear relationship between the independent and dependent variables. This may not always be the case in real-world data, where the relationship may be non-linear. In such cases, linear LS may not provide an accurate fit and other regression methods may be more suitable.

4. How can I improve the performance of linear LS on polynomials?

One way to improve the performance of linear LS on polynomials is to use a higher degree polynomial function. This allows for more flexibility in capturing non-linear relationships in the data. Additionally, incorporating other variables or transformations of variables can also improve the fit.

5. Are there other methods for fitting polynomials to data?

Yes, there are other methods such as non-linear least squares, which can handle non-linear relationships between variables. Additionally, machine learning algorithms such as polynomial regression and neural networks can also be used to fit polynomial functions to data. The choice of method depends on the specific data and research question at hand.

Similar threads

  • Linear and Abstract Algebra
Replies
8
Views
1K
  • Linear and Abstract Algebra
Replies
3
Views
1K
  • Set Theory, Logic, Probability, Statistics
Replies
22
Views
2K
  • Linear and Abstract Algebra
Replies
4
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
7
Views
442
  • Set Theory, Logic, Probability, Statistics
Replies
23
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
3
Views
806
  • Set Theory, Logic, Probability, Statistics
Replies
30
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
4
Views
1K
  • Differential Equations
Replies
1
Views
1K
Back
Top