Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Probability from an optimization problem

  1. Aug 11, 2013 #1
    I have a cost function which consists of sum of a set of quadratic loss plus a term which regularize function. my problem is: is there any way to infer probability from such a cost functions?
  2. jcsd
  3. Aug 11, 2013 #2


    User Avatar
    2017 Award

    Staff: Mentor

    Probability of what?

    It would be useful to see the full problem.
  4. Aug 12, 2013 #3
    Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2. i need to compute the p((y,x)|w). i know that sometimes it is good to assume a logistic regression distribution, but if we cannot make such an assumption, is there any way to compute p(y|x,w) or p((y,x)|w)?
    Also, is there any way to compute covvariance of parameter w? obviously, it is related to the above probability via cramer-rao and fisher information. I want also to compute fisher information.
    Last edited: Aug 12, 2013
  5. Aug 12, 2013 #4

    Stephen Tashi

    User Avatar
    Science Advisor

    That is not clear statement of a problem. If you can't explain the problem, perhaps you can give a link to an online example of a similar problem.

    The simplest way to describe a regression problem would be to describe the format of the data. Explain which variable is to be predicted from which other variable(s).

    What does "y" represent?
  6. Aug 13, 2013 #5
    Hi Stephen, here is the required information about the problem.
    The data x_i is an n-dimensional vector. we have a set of {x_i},i=1,2,...,n. y = <w,x>. By, <w,x> i mean w^T.x .i.e inner product of the w and x. we want to find w as an n-dimensional vector such that the above mentioned cost function is minimized.y is to be predicted from x. I hope it made statement of the problem clear.But regarding the format of the data, we can only know that it is an n-dimensional vector.

    Thank you
    Last edited: Aug 13, 2013
  7. Aug 13, 2013 #6

    Stephen Tashi

    User Avatar
    Science Advisor

    What you said is clear. However, the meaning of p(y|x,w) or other notation involving "p(y..." is not clear. As mfb asked, what is the event "y"?

    In your problem, presumably you have data as an array of vectors

    y[1], x[1][1], x[1][2],...x[1][n],
    y[2], x[2][1],x[2][2],... x[2][n],
    y[m], x[m][1],x[m][2],...x[m][n]

    And you have a vector of constants

    [itex] w[1],w[2],...w[n] [/itex]

    And you have a model in the variables [itex] Y [/itex] and [itex] X[1],X[2]...X[n] [/itex]

    [itex] Y = \sum_{i=1}^n w X [/itex]

    But the meaning of "p(y...)" is not clear.
  8. Aug 19, 2013 #7
    Y is the label in this classification problem. it is +1 or -1. in fact in this problem, we are going to find a hyperplane which discriminates samples with plus or minus label. P(y|w,x) is the probability of label y=1 or -1 given w and x.
  9. Aug 19, 2013 #8

    Stephen Tashi

    User Avatar
    Science Advisor

    I think you should try to give a link to some online explanation of a similar problem.

    Are you saying the weights [itex] w [/itex] must be chosen so [itex] \sum_{i=1}^n w X [/itex] is either exactly +1 or -1. Or is the sum rounded to the nearest integer? Or rounded in some other way to either +1 or -1 ?

    This doesn't explain what y is. To use probability, you need a "probability space". The points in the space are "outcomes" of some process. The sets that are assigned probabilities are "events". What random process generates the outcome or event y? Your problem has several y's in it. There are observations [itex] y[1], y[2],... [/itex] and there are predictions [itex] Y [/itex].
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook