Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

I have a random sequence , is there some operation I can do on seq. to get its pdf?

  1. Oct 12, 2012 #1
    Hi all,
    suppose I have a random discrete sequence like x= [1 2 3 2 5 2 4 2 3 1 6 3 5] (where possible outcomes are 1,2,3,4,5 or 6) and wanted to get its frequency distribution vector
    f=[2 4 3 1 2 1] which means frequency of occurrence of 1 is 2 times, 2 occurs 4 times , and so on. I wanted a mathematical function or operator so that vector x can be transformed to f. Is it possible?

    Generally x is the input to the system and f must be the output. And how this case can be generalized to continuous case.

    TIA
     
  2. jcsd
  3. Oct 12, 2012 #2
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    Atleast It would be helpful to know how to find the inverse like operation of a non-square matrix or vector like

    if Ax=B

    x=inv(A) B like that. I dont know if that works.
     
  4. Oct 13, 2012 #3
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    So you are using data to guess the pdf. It's called the "emperical pdf". It's obvious how to derive it, isn't it? I don't see what the problem is.
     
  5. Oct 13, 2012 #4
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    Thankyou for the reply , I was not knowing about the term 'emperical pdf'. Any way , can you suggest some references to help me? Is there equations to find it? I mean some sort of transformation we work on input random sequence to get Prob. density function
     
    Last edited: Oct 13, 2012
  6. Oct 13, 2012 #5

    haruspex

    User Avatar
    Science Advisor
    Homework Helper
    Gold Member
    2016 Award

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    As I read it, the OP is asking for a mathematical function to convert a vector representing results of trials to a vector consisting of the counts of the different outcomes.
    Dexterdev, I'm not aware of any such function. It certainly couldn't be a matrix since it would not be a linear operation. Why do you need it to be a mathematical function?
     
  7. Oct 13, 2012 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    You haven't defined a specific mathematical problem because "random discrete sequence" isn't a specific type of random process. You must define the process that generates the sequence. For example, it isn't clear whether the selection of one term in your sequence is independent of the selection of the other terms. If you have a real life application in mind, you'll get better advice by telling what it is.
     
  8. Oct 14, 2012 #7
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    There is a standard method for finding the pdf of DRV's. That is just by counting the number of occurences of the particular value and dividing that count by the total number of samples u consider for this. When the number of samples considered becomes larger, more accuarte will be the pdf value. If u want to form a system in which u need to get the pdf at the output for an input of a number of input observation samples, program the above method as the system operation and implement it. What is the need of looking for some other operation? What's the problem with existing method?
     
  9. Oct 14, 2012 #8
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    ok guys, I will explain why I need such a mathematical operation. I would like to illustrate Central limit theorem. we know that if x1 and x2 are 2 independent rnd variables with pdfs pdf(x1) and pdf(x2) respectively we have pdf(x1 + x2) = convolution ( pdf(x1), pdf(x2) ).

    So I thought explaining this way

    x1 ------------------> pdf(x1) = some equation depending on x1
    x2 ------------------> pdf(x2) = some equation depending on x2

    x1 + x2 -------------> pdf(x1+x2) = some eqn depending on x1 and x2 ie here convolution(pdf(x1),pdf(x2))

    Is it right that finding discrete prob. density fn similar to mapping a sequence to other domain with some loss of information?

    TIA
     
    Last edited: Oct 14, 2012
  10. Oct 14, 2012 #9

    Stephen Tashi

    User Avatar
    Science Advisor

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    I see no similarity between the two situations. The pdf of x1+x2 is a function of one variable, not two variables. How can you write the pdf of a sequence as a function of one variable?
     
  11. Oct 14, 2012 #10
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    If you dont mind please explain that to me......
     
  12. Oct 15, 2012 #11

    Stephen Tashi

    User Avatar
    Science Advisor

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    Explain what?
     
  13. Oct 15, 2012 #12

    chiro

    User Avatar
    Science Advisor

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    Hey dexterdev.

    Are you trying to take a set of data (sample data) to generate a purely symbolic representation for the PDF function (in symbolic form) given the sample?
     
  14. Oct 15, 2012 #13
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    yes.
     
  15. Oct 15, 2012 #14

    chiro

    User Avatar
    Science Advisor

    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    Well in that case you're going to face quite a lot of issues.

    The first thing you need to think about is what the space of possible representations will be and the technique you will use.

    There is an area (or maybe sub-area is the right word) that deals with interpolation and this is whole field of numeric analysis in itself. Interpolation is basically a way of generating representations for a function that go through known points with special properties that are unique to the interpolation algorithm.

    There are many algorithms to interpolate and some of the more complex methods are known as NURBS or Non-Uniform-Rational-B-SPLINES which allow you to interpolate not only over points but also to specify multiplicities of each point with special vectors and you can really have a lot more control over the process than your standard Lagrange Polynomial.

    Now the interpolation will generate polynomial expressions but the thing is that these expressions using this method will look like garbage since you will have 100's of terms with 100's of data points and if this goes to say thousands or tens of thousands of points, then you can see that doing this is going backwards.

    So that's the interpolation side.

    The second way that is looked at deals with what happens in signal processing, data compression, and other similar fields (these two are applied everywhere and signal processing is a field of its own for good reason).

    What happens in signal processing is that you have a signal and an orthogonal basis and you project the signal to the basis and re-construct the signal as a linear combination of basis vectors.

    This kind of thing in mathematics is known as Fourier Analysis and deals with orthogonal functions: you take a signal get the component for each basis and then you can get coeffecients that are used to re-construct the signal relative to that basis.

    The third way deals with a form of convergence to a particular model and this is used in probability and statistics frequently and one particular way is known as the EM method of expectation maximization method.

    This works by fitting an arbitrary distribution to a fixed model and getting the best representation of the arbitrary data relative to that model.

    So you have to assume a PDF model and then the algorithm takes your data and provides the best fit in accordance to that model.

    The difference between two and three is that the second is an explicit technique and the third is an implicit technique.

    So now you are faced with a few decisions: the first one generates a symbolic equation that doesn't give you anything useful than what you get given the raw data and the other two require you to give a basis or a model to fit to.

    You have a trade-off of either not making any assumptions about constraints and getting something that just confuses you more, or you make assumptions about the constraints which means you are pre-defining the characteristics of the model anyway and simply fitting your data to a pre-defined set of constraints.

    So what do you choose?
     
  16. Oct 16, 2012 #15
    Re: I have a random sequence , is there some operation I can do on seq. to get its pd

    I will read your explanation and reply. Thanks for your elaborate reply.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: I have a random sequence , is there some operation I can do on seq. to get its pdf?
Loading...