# Probability from an optimization problem

1. Aug 11, 2013

### hosseinGhafari

I have a cost function which consists of sum of a set of quadratic loss plus a term which regularize function. my problem is: is there any way to infer probability from such a cost functions?

2. Aug 11, 2013

### Staff: Mentor

Probability of what?

It would be useful to see the full problem.

3. Aug 12, 2013

### hosseinGhafari

Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2. i need to compute the p((y,x)|w). i know that sometimes it is good to assume a logistic regression distribution, but if we cannot make such an assumption, is there any way to compute p(y|x,w) or p((y,x)|w)?
Also, is there any way to compute covvariance of parameter w? obviously, it is related to the above probability via cramer-rao and fisher information. I want also to compute fisher information.
Thanks

Last edited: Aug 12, 2013
4. Aug 12, 2013

### Stephen Tashi

That is not clear statement of a problem. If you can't explain the problem, perhaps you can give a link to an online example of a similar problem.

The simplest way to describe a regression problem would be to describe the format of the data. Explain which variable is to be predicted from which other variable(s).

What does "y" represent?

5. Aug 13, 2013

### hosseinGhafari

Hi Stephen, here is the required information about the problem.
The data x_i is an n-dimensional vector. we have a set of {x_i},i=1,2,...,n. y = <w,x>. By, <w,x> i mean w^T.x .i.e inner product of the w and x. we want to find w as an n-dimensional vector such that the above mentioned cost function is minimized.y is to be predicted from x. I hope it made statement of the problem clear.But regarding the format of the data, we can only know that it is an n-dimensional vector.

Thank you

Last edited: Aug 13, 2013
6. Aug 13, 2013

### Stephen Tashi

What you said is clear. However, the meaning of p(y|x,w) or other notation involving "p(y..." is not clear. As mfb asked, what is the event "y"?

In your problem, presumably you have data as an array of vectors

y[1], x[1][1], x[1][2],...x[1][n],
y[2], x[2][1],x[2][2],... x[2][n],
....
y[m], x[m][1],x[m][2],...x[m][n]

And you have a vector of constants

$w[1],w[2],...w[n]$

And you have a model in the variables $Y$ and $X[1],X[2]...X[n]$

$Y = \sum_{i=1}^n w X$

But the meaning of "p(y...)" is not clear.

7. Aug 19, 2013

### hosseinGhafari

Y is the label in this classification problem. it is +1 or -1. in fact in this problem, we are going to find a hyperplane which discriminates samples with plus or minus label. P(y|w,x) is the probability of label y=1 or -1 given w and x.

8. Aug 19, 2013

### Stephen Tashi

I think you should try to give a link to some online explanation of a similar problem.

Are you saying the weights $w$ must be chosen so $\sum_{i=1}^n w X$ is either exactly +1 or -1. Or is the sum rounded to the nearest integer? Or rounded in some other way to either +1 or -1 ?

This doesn't explain what y is. To use probability, you need a "probability space". The points in the space are "outcomes" of some process. The sets that are assigned probabilities are "events". What random process generates the outcome or event y? Your problem has several y's in it. There are observations $y[1], y[2],...$ and there are predictions $Y$.