Probability from an optimization problem

AI Thread Summary
The discussion revolves around inferring probabilities from a cost function in a linear regression model, specifically regarding the computation of p(y|x,w) and p((y,x)|w). The cost function includes a quadratic loss and a regularization term, and the goal is to minimize this function to find the weight vector w. The participants seek clarification on the meaning of y in this context, which is identified as a binary label (+1 or -1) that the model aims to predict based on the input vectors x. There is also interest in calculating the covariance of the parameter w and the Fisher information related to the probabilities. The conversation highlights the need for a clearer definition of the probability space and the events involved in the problem.
hosseinGhafari
Messages
4
Reaction score
0
I have a cost function which consists of sum of a set of quadratic loss plus a term which regularize function. my problem is: is there any way to infer probability from such a cost functions?
 
Physics news on Phys.org
Probability of what?

It would be useful to see the full problem.
 
Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2. i need to compute the p((y,x)|w). i know that sometimes it is good to assume a logistic regression distribution, but if we cannot make such an assumption, is there any way to compute p(y|x,w) or p((y,x)|w)?
Also, is there any way to compute covvariance of parameter w? obviously, it is related to the above probability via cramer-rao and fisher information. I want also to compute fisher information.
Thanks
 
Last edited:
hosseinGhafari said:
Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2.

That is not clear statement of a problem. If you can't explain the problem, perhaps you can give a link to an online example of a similar problem.

The simplest way to describe a regression problem would be to describe the format of the data. Explain which variable is to be predicted from which other variable(s).

i need to compute the p((y,x)|w)

What does "y" represent?
 
Hi Stephen, here is the required information about the problem.
The data x_i is an n-dimensional vector. we have a set of {x_i},i=1,2,...,n. y = <w,x>. By, <w,x> i mean w^T.x .i.e inner product of the w and x. we want to find w as an n-dimensional vector such that the above mentioned cost function is minimized.y is to be predicted from x. I hope it made statement of the problem clear.But regarding the format of the data, we can only know that it is an n-dimensional vector.

Thank you
 
Last edited:
hosseinGhafari said:
I hope it made statement of the problem clear.

What you said is clear. However, the meaning of p(y|x,w) or other notation involving "p(y..." is not clear. As mfb asked, what is the event "y"?

In your problem, presumably you have data as an array of vectors

y[1], x[1][1], x[1][2],...x[1][n],
y[2], x[2][1],x[2][2],... x[2][n],
...
y[m], x[m][1],x[m][2],...x[m][n]

And you have a vector of constants

w[1],w[2],...w[n]

And you have a model in the variables Y and X[1],X[2]...X[n]

Y = \sum_{i=1}^n w<i> X </i>


But the meaning of "p(y...)" is not clear.
 
Y is the label in this classification problem. it is +1 or -1. in fact in this problem, we are going to find a hyperplane which discriminates samples with plus or minus label. P(y|w,x) is the probability of label y=1 or -1 given w and x.
 
I think you should try to give a link to some online explanation of a similar problem.

hosseinGhafari said:
Y is the label in this classification problem. it is +1 or -1.

Are you saying the weights w must be chosen so \sum_{i=1}^n w<i> X </i> is either exactly +1 or -1. Or is the sum rounded to the nearest integer? Or rounded in some other way to either +1 or -1 ?

P(y|w,x) is the probability of label y=1 or -1 given w and x.

This doesn't explain what y is. To use probability, you need a "probability space". The points in the space are "outcomes" of some process. The sets that are assigned probabilities are "events". What random process generates the outcome or event y? Your problem has several y's in it. There are observations y[1], y[2],... and there are predictions Y.
 
Back
Top