# How to fit plane onto sampling data?

• sceptic
Normal distribution.In summary, the conversation discusses the possibility of approximating a probability distribution as a linear function, specifically a plane. Various methods are suggested, including multiple linear regression, least squares fit with constraints, and using a non-linear basis for projection. The idea of projection is explained, which involves turning data points into a function that can be projected onto a chosen basis. This technique is commonly used in various applications such as image and signal processing.

#### sceptic

For example I have the variables x, y and a probability distribution p(x,y). I want to approximate p(x,y) as a linear function, a plane in this case, at least somewhere in the domain. However I only have samples from the distribution. In case of big amount of data the it is easy to collect them into bins, and fit a plane onto the estimated density function. But I don't want to compress data, I would like to use point itself. Are there any method to do this?

Multiple linear regression would tell you what numbers, a1, a2, a3 would give the best fit of a1*X + a2*Y + a3 through the data. The "best fit" is defined as the one that minimizes the sum of the squared errors between the plane and the data.

As FactChecker says, to fit a probability distribution, you can do a least squares fit of a plane but you must have the added constraint that the area under the plane integrates to 1. How to add that constraint, is something we'd have to think about.

Stephen Tashi said:
As FactChecker says, to fit a probability distribution, you can do a least squares fit of a plane but you must have the added constraint that the area under the plane integrates to 1. How to add that constraint, is something we'd have to think about.

Edit: But doing a linear regression would involve some kind of binning so the data would represent a frequency instead of isolated measurements. Perhaps a maximum liklihood fit is needed if you don't want to bin the data.

Stephen Tashi said:
Edit: But doing a linear regression would involve some kind of binning so the data would represent a frequency instead of isolated measurements. Perhaps a maximum liklihood fit is needed if you don't want to bin the data.
Sorry. I suggested regression without thinking enough. Never mind.

"binning so the data would represent a frequency instead of isolated measurements.
Stephen Tashi said:
binning so the data would represent a frequency instead of isolated measurements.
"
This is, what I do not want! So the question remains!

It's difficult to give good advice about fitting a plane to "data" as a generality. What exactly is this data?

In particular what are the bounds on x and y ? Is there any theoretical reason to believe that (x,y) can exceed the ranges of x or y that were actually observed in the data? To fit a probability distribution, we must use a function whose integral over the possible (x,y) values is 1. So it's important to know if we have a good handle on the bounds of x and y.

What is the resolution and structure of the measurements?

sceptic said:
For example I have the variables x, y and a probability distribution p(x,y). I want to approximate p(x,y) as a linear function, a plane in this case, at least somewhere in the domain. However I only have samples from the distribution. In case of big amount of data the it is easy to collect them into bins, and fit a plane onto the estimated density function. But I don't want to compress data, I would like to use point itself. Are there any method to do this?

A plane is determined by three points. You can always extract three points from a distribution. But what if the distribution does not look like a plane?

This is a normal distribution in x and y separately. How would you fit a plane unto that?

You need to pick a non-linear basis in this case rather than a linear one.

I would look at projecting your data points to some function that makes sense. If your function is a bi-variate normal then use that to start off with.

The topic of projection is covered in harmonic analysis and you would be using orthogonal polynomials within some interval.

chiro said:
I would look at projecting your data points to some function that makes sense. If your function is a bi-variate normal then use that to start off with.

What would it mean to "project a data point"?

In linear spaces (including infinite dimensional spaces) you can project points (or even functions) to a basis.

A good example of something non-linear is when you project a function or a set of data points to sine and cosine bases. These are orthogonal (the proof is straightforward) and the inner product is <f,g> = Integral[a,b] f(x)g(x)dx for the one dimensional case. You can extend it into multiple dimensions.

The idea is that after the projection, you reconstruct it by figuring out the coefficients obtained from projecting it to that basis in much the same way we do projections in normal linear spaces (again using the inner product) and use this to re-construct the space.

In a linear space we use <f,e_x> where e_x is a basis with ||e_x|| = 1 and the same thing happens for infinite dimensional linear spaces like with Fourier series, wavelets, and other structures with similar techniques - which are based on Hilbert space theory and harmonic analysis.

This idea of projection of functions or data points (different techniques for both kinds) is done a lot. Image processing, Video processing, Audio processing, compression, signal processing (general signals) and many other applications make use of this.

Because the OP doesn't want to bin the data, a data point is not a probability. So I don't see how one could "project" a data point onto a function whose values represent a probability density.

You turn it into something that can be projected - like a function.

Interpolation is one way to do this - but it's not the only way.

Once you have a function and a basis to project to you find <f,g> and do the same thing to re-construct the function.

If you are projecting to a Normal distribution then you take the approximated function you get back and get the estimates for parameters using expectations for said final function. You will typically use a set of orthogonal polynomials with enough accuracy (based on the degree of the set of orthogonal polynomials) and then after you reconstruct it you get back a function that represents the new basis.

Normal distributions involve exponential terms but in a given interval you can always approximate it well enough by choosing a high enough degree polynomial.

If you have a lattice of points you would get a function that interpolates through all points - find an orthogonal basis for a bivariate polynomial of large enough degree - use the decomposition process of Hilbert-Schmidt to get the orthogonal polynomials (using <f,g> as above) and then take your interpolated function and project it to the new basis.

After this you get a bivariate polynomial that approximates the bi-variate normal and then you can use that to see what the approximate distribution is.

You don't have to use a Normal distribution - but you do need a basis that "makes sense".

You can use expectation results of the new function to approximate the parameters of the distribution are you are looking for - whatever it may be.

chiro said:
You turn it into something that can be projected - like a function.

Interpolation is one way to do this - but it's not the only way.

The data apparently has the form (x,y) and the goal is to fit a density function z = f(x,y). What kind of interpolation would you perform on the data? What would the domain and codomain of the function that interpolates be? Unless you bin the data, there is no z value in associated with the (x,y) data.

Piecewise linear would do as a minimum. You could very creative but you could create a surface based on piece-wise linear planes that you "glue" together.

Once you have this then you project it on another basis just like I mentioned above.

So for some box in R^2 you define a plane in that box and you parameterize the space so that for some x in [a,b] and y in [c,d] you have two parameters t and u such that you convert your x's and y's into t's and u's for that region and use that to get the point of the plane. Basically for [a,b] X [c,d] you have a plane and you get two vectors on the plane - one with respect to the x-axis and one with respect to y and you parameterize so that t and u go from 0 to 1 that span all edges of the plane in that region.

You can get more creative than this but it is the simplest to do for a function of two independent variables.

Projecting between different spaces is a common thing in fields like signal processing, compressing images, videos, and audio and doing pattern matching and data analysis activities.

chiro said:
Piecewise linear would do as a minimum. You could very creative but you could create a surface based on piece-wise linear planes that you "glue" together.
Once you have this then you project it on another basis just like I mentioned above.
So for some box in R^2 you define a plane in that box and you parameterize the space so that for some x in [a,b] and y in [c,d] you have two parameters t and u such that you convert your x's and y's into t's and u's for that region and use that to get the point of the plane.

The problem with that there is no given plane in 3-D that is asssociated with a box of the (x,y) data and it isn't clear how to define an appropriate plane without binning the data.

That is correct but you could make the bins small enough to approximate the effect of a continuous space.

It just means that you have a lot of boxes to use. If the bin size is small enough then the departure between a truly continuous distribution and a discrete one will quite large. As long as you have enough points in a given region (i.e. not sparse) then it should be OK and still faithfully capture the nature of the distribution.

Also notice that when you reconstruct the new function with the basis that is purely continuous and smooth the reconstruction will reflect that. You will have a Normal curve, or a Gamma curve, or any other curve after you project your binned data back to your new Hilbert-space basis so even though the binned data are discrete, your new basis will still stay the way it is but it will lose some resolution in the whole projection process.

The key really is choosing the right orthogonal polynomial basis within the region that you are looking at and understanding the nature of projection between binned data and the final linear combination of basis polynomials.

This idea of taking discrete data (liked the binned examples) and then turning it into some "smooth" function is what is commonly done in signal processing - particularly in the areas of data compression (think video, image, audio as examples).