# I have a random sequence , is there some operation I can do on seq. to get its pdf?

1. Oct 12, 2012

### dexterdev

Hi all,
suppose I have a random discrete sequence like x= [1 2 3 2 5 2 4 2 3 1 6 3 5] (where possible outcomes are 1,2,3,4,5 or 6) and wanted to get its frequency distribution vector
f=[2 4 3 1 2 1] which means frequency of occurrence of 1 is 2 times, 2 occurs 4 times , and so on. I wanted a mathematical function or operator so that vector x can be transformed to f. Is it possible?

Generally x is the input to the system and f must be the output. And how this case can be generalized to continuous case.

TIA

2. Oct 12, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

Atleast It would be helpful to know how to find the inverse like operation of a non-square matrix or vector like

if Ax=B

x=inv(A) B like that. I dont know if that works.

3. Oct 13, 2012

### ImaLooser

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

So you are using data to guess the pdf. It's called the "emperical pdf". It's obvious how to derive it, isn't it? I don't see what the problem is.

4. Oct 13, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

Thankyou for the reply , I was not knowing about the term 'emperical pdf'. Any way , can you suggest some references to help me? Is there equations to find it? I mean some sort of transformation we work on input random sequence to get Prob. density function

Last edited: Oct 13, 2012
5. Oct 13, 2012

### haruspex

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

As I read it, the OP is asking for a mathematical function to convert a vector representing results of trials to a vector consisting of the counts of the different outcomes.
Dexterdev, I'm not aware of any such function. It certainly couldn't be a matrix since it would not be a linear operation. Why do you need it to be a mathematical function?

6. Oct 13, 2012

### Stephen Tashi

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

You haven't defined a specific mathematical problem because "random discrete sequence" isn't a specific type of random process. You must define the process that generates the sequence. For example, it isn't clear whether the selection of one term in your sequence is independent of the selection of the other terms. If you have a real life application in mind, you'll get better advice by telling what it is.

7. Oct 14, 2012

### Mastersbn

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

There is a standard method for finding the pdf of DRV's. That is just by counting the number of occurences of the particular value and dividing that count by the total number of samples u consider for this. When the number of samples considered becomes larger, more accuarte will be the pdf value. If u want to form a system in which u need to get the pdf at the output for an input of a number of input observation samples, program the above method as the system operation and implement it. What is the need of looking for some other operation? What's the problem with existing method?

8. Oct 14, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

ok guys, I will explain why I need such a mathematical operation. I would like to illustrate Central limit theorem. we know that if x1 and x2 are 2 independent rnd variables with pdfs pdf(x1) and pdf(x2) respectively we have pdf(x1 + x2) = convolution ( pdf(x1), pdf(x2) ).

So I thought explaining this way

x1 ------------------> pdf(x1) = some equation depending on x1
x2 ------------------> pdf(x2) = some equation depending on x2

x1 + x2 -------------> pdf(x1+x2) = some eqn depending on x1 and x2 ie here convolution(pdf(x1),pdf(x2))

Is it right that finding discrete prob. density fn similar to mapping a sequence to other domain with some loss of information?

TIA

Last edited: Oct 14, 2012
9. Oct 14, 2012

### Stephen Tashi

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

I see no similarity between the two situations. The pdf of x1+x2 is a function of one variable, not two variables. How can you write the pdf of a sequence as a function of one variable?

10. Oct 14, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

If you dont mind please explain that to me......

11. Oct 15, 2012

### Stephen Tashi

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

Explain what?

12. Oct 15, 2012

### chiro

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

Hey dexterdev.

Are you trying to take a set of data (sample data) to generate a purely symbolic representation for the PDF function (in symbolic form) given the sample?

13. Oct 15, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

yes.

14. Oct 15, 2012

### chiro

Re: I have a random sequence , is there some operation I can do on seq. to get its pd

Well in that case you're going to face quite a lot of issues.

The first thing you need to think about is what the space of possible representations will be and the technique you will use.

There is an area (or maybe sub-area is the right word) that deals with interpolation and this is whole field of numeric analysis in itself. Interpolation is basically a way of generating representations for a function that go through known points with special properties that are unique to the interpolation algorithm.

There are many algorithms to interpolate and some of the more complex methods are known as NURBS or Non-Uniform-Rational-B-SPLINES which allow you to interpolate not only over points but also to specify multiplicities of each point with special vectors and you can really have a lot more control over the process than your standard Lagrange Polynomial.

Now the interpolation will generate polynomial expressions but the thing is that these expressions using this method will look like garbage since you will have 100's of terms with 100's of data points and if this goes to say thousands or tens of thousands of points, then you can see that doing this is going backwards.

So that's the interpolation side.

The second way that is looked at deals with what happens in signal processing, data compression, and other similar fields (these two are applied everywhere and signal processing is a field of its own for good reason).

What happens in signal processing is that you have a signal and an orthogonal basis and you project the signal to the basis and re-construct the signal as a linear combination of basis vectors.

This kind of thing in mathematics is known as Fourier Analysis and deals with orthogonal functions: you take a signal get the component for each basis and then you can get coeffecients that are used to re-construct the signal relative to that basis.

The third way deals with a form of convergence to a particular model and this is used in probability and statistics frequently and one particular way is known as the EM method of expectation maximization method.

This works by fitting an arbitrary distribution to a fixed model and getting the best representation of the arbitrary data relative to that model.

So you have to assume a PDF model and then the algorithm takes your data and provides the best fit in accordance to that model.

The difference between two and three is that the second is an explicit technique and the third is an implicit technique.

So now you are faced with a few decisions: the first one generates a symbolic equation that doesn't give you anything useful than what you get given the raw data and the other two require you to give a basis or a model to fit to.

You have a trade-off of either not making any assumptions about constraints and getting something that just confuses you more, or you make assumptions about the constraints which means you are pre-defining the characteristics of the model anyway and simply fitting your data to a pre-defined set of constraints.

So what do you choose?

15. Oct 16, 2012

### dexterdev

Re: I have a random sequence , is there some operation I can do on seq. to get its pd