- #1

- 10

- 0

You are using an out of date browser. It may not display this or other websites correctly.

You should upgrade or use an alternative browser.

You should upgrade or use an alternative browser.

- Thread starter sceptic
- Start date

Normal distribution.In summary, the conversation discusses the possibility of approximating a probability distribution as a linear function, specifically a plane. Various methods are suggested, including multiple linear regression, least squares fit with constraints, and using a non-linear basis for projection. The idea of projection is explained, which involves turning data points into a function that can be projected onto a chosen basis. This technique is commonly used in various applications such as image and signal processing.

- #1

- 10

- 0

Physics news on Phys.org

- #2

Science Advisor

Homework Helper

Gold Member

- 8,215

- 3,818

- #3

Science Advisor

- 7,859

- 1,591

- #4

Science Advisor

- 7,859

- 1,591

Stephen Tashi said:

Edit: But doing a linear regression would involve some kind of binning so the data would represent a frequency instead of isolated measurements. Perhaps a maximum liklihood fit is needed if you don't want to bin the data.

- #5

Science Advisor

Homework Helper

Gold Member

- 8,215

- 3,818

Sorry. I suggested regression without thinking enough. Never mind.Stephen Tashi said:Edit: But doing a linear regression would involve some kind of binning so the data would represent a frequency instead of isolated measurements. Perhaps a maximum liklihood fit is needed if you don't want to bin the data.

- #6

- 10

- 0

"Stephen Tashi said:binning so the data would represent a frequency instead of isolated measurements.

This is, what I do not want! So the question remains!

- #7

Science Advisor

- 7,859

- 1,591

In particular what are the bounds on x and y ? Is there any theoretical reason to believe that (x,y) can exceed the ranges of x or y that were actually observed in the data? To fit a probability distribution, we must use a function whose integral over the possible (x,y) values is 1. So it's important to know if we have a good handle on the bounds of x and y.

- #8

Science Advisor

- 4,816

- 134

What is the resolution and structure of the measurements?

- #9

Science Advisor

- 2,295

- 793

sceptic said:

A plane is determined by three points. You can always extract three points from a distribution. But what if the distribution

This is a normal distribution in x and y separately. How would you fit a plane unto that?

- #10

Science Advisor

- 4,816

- 134

I would look at projecting your data points to some function that makes sense. If your function is a bi-variate normal then use that to start off with.

The topic of projection is covered in harmonic analysis and you would be using orthogonal polynomials within some interval.

- #11

Science Advisor

- 7,859

- 1,591

chiro said:I would look at projecting your data points to some function that makes sense. If your function is a bi-variate normal then use that to start off with.

What would it mean to "project a data point"?

- #12

Science Advisor

- 4,816

- 134

A good example of something non-linear is when you project a function or a set of data points to sine and cosine bases. These are orthogonal (the proof is straightforward) and the inner product is <f,g> = Integral[a,b] f(x)g(x)dx for the one dimensional case. You can extend it into multiple dimensions.

The idea is that after the projection, you reconstruct it by figuring out the coefficients obtained from projecting it to that basis in much the same way we do projections in normal linear spaces (again using the inner product) and use this to re-construct the space.

In a linear space we use <f,e_x> where e_x is a basis with ||e_x|| = 1 and the same thing happens for infinite dimensional linear spaces like with Fourier series, wavelets, and other structures with similar techniques - which are based on Hilbert space theory and harmonic analysis.

This idea of projection of functions or data points (different techniques for both kinds) is done a lot. Image processing, Video processing, Audio processing, compression, signal processing (general signals) and many other applications make use of this.

- #13

Science Advisor

- 7,859

- 1,591

- #14

Science Advisor

- 4,816

- 134

Interpolation is one way to do this - but it's not the only way.

Once you have a function and a basis to project to you find <f,g> and do the same thing to re-construct the function.

If you are projecting to a Normal distribution then you take the approximated function you get back and get the estimates for parameters using expectations for said final function. You will typically use a set of orthogonal polynomials with enough accuracy (based on the degree of the set of orthogonal polynomials) and then after you reconstruct it you get back a function that represents the new basis.

Normal distributions involve exponential terms but in a given interval you can always approximate it well enough by choosing a high enough degree polynomial.

If you have a lattice of points you would get a function that interpolates through all points - find an orthogonal basis for a bivariate polynomial of large enough degree - use the decomposition process of Hilbert-Schmidt to get the orthogonal polynomials (using <f,g> as above) and then take your interpolated function and project it to the new basis.

After this you get a bivariate polynomial that approximates the bi-variate normal and then you can use that to see what the approximate distribution is.

You don't have to use a Normal distribution - but you do need a basis that "makes sense".

You can use expectation results of the new function to approximate the parameters of the distribution are you are looking for - whatever it may be.

- #15

Science Advisor

- 7,859

- 1,591

chiro said:You turn it into something that can be projected - like a function.

Interpolation is one way to do this - but it's not the only way.

The data apparently has the form (x,y) and the goal is to fit a density function z = f(x,y). What kind of interpolation would you perform on the data? What would the domain and codomain of the function that interpolates be? Unless you bin the data, there is no z value in associated with the (x,y) data.

- #16

Science Advisor

- 4,816

- 134

Once you have this then you project it on another basis just like I mentioned above.

So for some box in R^2 you define a plane in that box and you parameterize the space so that for some x in [a,b] and y in [c,d] you have two parameters t and u such that you convert your x's and y's into t's and u's for that region and use that to get the point of the plane. Basically for [a,b] X [c,d] you have a plane and you get two vectors on the plane - one with respect to the x-axis and one with respect to y and you parameterize so that t and u go from 0 to 1 that span all edges of the plane in that region.

You can get more creative than this but it is the simplest to do for a function of two independent variables.

Projecting between different spaces is a common thing in fields like signal processing, compressing images, videos, and audio and doing pattern matching and data analysis activities.

- #17

Science Advisor

- 7,859

- 1,591

chiro said:Piecewise linear would do as a minimum. You could very creative but you could create a surface based on piece-wise linear planes that you "glue" together.

Once you have this then you project it on another basis just like I mentioned above.

So for some box in R^2 you define a plane in that box and you parameterize the space so that for some x in [a,b] and y in [c,d] you have two parameters t and u such that you convert your x's and y's into t's and u's for that region and use that to get the point of the plane.

The problem with that there is no given plane in 3-D that is asssociated with a box of the (x,y) data and it isn't clear how to define an appropriate plane without binning the data.

- #18

Science Advisor

- 4,816

- 134

It just means that you have a lot of boxes to use. If the bin size is small enough then the departure between a truly continuous distribution and a discrete one will quite large. As long as you have enough points in a given region (i.e. not sparse) then it should be OK and still faithfully capture the nature of the distribution.

Also notice that when you reconstruct the new function with the basis that is purely continuous and smooth the reconstruction will reflect that. You will have a Normal curve, or a Gamma curve, or any other curve after you project your binned data back to your new Hilbert-space basis so even though the binned data are discrete, your new basis will still stay the way it is but it will lose some resolution in the whole projection process.

The key really is choosing the right orthogonal polynomial basis within the region that you are looking at and understanding the nature of projection between binned data and the final linear combination of basis polynomials.

This idea of taking discrete data (liked the binned examples) and then turning it into some "smooth" function is what is commonly done in signal processing - particularly in the areas of data compression (think video, image, audio as examples).

Share:

- Replies
- 1

- Views
- 400

- Replies
- 6

- Views
- 890

- Replies
- 3

- Views
- 714

- Replies
- 16

- Views
- 1K

- Replies
- 0

- Views
- 649

- Replies
- 28

- Views
- 2K

- Replies
- 28

- Views
- 2K

- Replies
- 2

- Views
- 848

- Replies
- 24

- Views
- 2K

- Replies
- 26

- Views
- 2K