Probability from an optimization problem

hosseinGhafari · Aug 11, 2013

I have a cost function which consists of sum of a set of quadratic loss plus a term which regularize function. my problem is: is there any way to infer probability from such a cost functions?

mfb · Aug 11, 2013

Probability of what?

It would be useful to see the full problem.

hosseinGhafari · Aug 12, 2013

Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2. i need to compute the p((y,x)|w). i know that sometimes it is good to assume a logistic regression distribution, but if we cannot make such an assumption, is there any way to compute p(y|x,w) or p((y,x)|w)?
Also, is there any way to compute covvariance of parameter w? obviously, it is related to the above probability via cramer-rao and fisher information. I want also to compute fisher information.
Thanks

Stephen Tashi · Aug 12, 2013

hosseinGhafari said:

Here is the full problem: i have a linear regression, <w,x> with cost function \sigma(y_i,<w,x_i>)^2+\lambda*norm(w)^2.

That is not clear statement of a problem. If you can't explain the problem, perhaps you can give a link to an online example of a similar problem.

The simplest way to describe a regression problem would be to describe the format of the data. Explain which variable is to be predicted from which other variable(s).

i need to compute the p((y,x)|w)

What does "y" represent?

hosseinGhafari · Aug 13, 2013

Hi Stephen, here is the required information about the problem.
The data x_i is an n-dimensional vector. we have a set of {x_i},i=1,2,...,n. y = <w,x>. By, <w,x> i mean w^T.x .i.e inner product of the w and x. we want to find w as an n-dimensional vector such that the above mentioned cost function is minimized.y is to be predicted from x. I hope it made statement of the problem clear.But regarding the format of the data, we can only know that it is an n-dimensional vector.

Thank you

Stephen Tashi · Aug 13, 2013

hosseinGhafari said:

I hope it made statement of the problem clear.

What you said is clear. However, the meaning of p(y|x,w) or other notation involving "p(y..." is not clear. As mfb asked, what is the event "y"?

In your problem, presumably you have data as an array of vectors

y[1], x[1][1], x[1][2],...x[1][n],
y[2], x[2][1],x[2][2],... x[2][n],
...
y[m], x[m][1],x[m][2],...x[m][n]

And you have a vector of constants

[itex]w[1],w[2],...w[n][/itex]

And you have a model in the variables [itex]Y[/itex] and [itex]X[1],X[2]...X[n][/itex]

[itex]Y = \sum_{i=1}^n w<i> X </i>[/itex]

But the meaning of "p(y...)" is not clear.

hosseinGhafari · Aug 19, 2013

Y is the label in this classification problem. it is +1 or -1. in fact in this problem, we are going to find a hyperplane which discriminates samples with plus or minus label. P(y|w,x) is the probability of label y=1 or -1 given w and x.

Stephen Tashi · Aug 19, 2013

I think you should try to give a link to some online explanation of a similar problem.

hosseinGhafari said:

Y is the label in this classification problem. it is +1 or -1.

Are you saying the weights [itex]w[/itex] must be chosen so [itex]\sum_{i=1}^n w<i> X </i>[/itex] is either exactly +1 or -1. Or is the sum rounded to the nearest integer? Or rounded in some other way to either +1 or -1 ?

P(y|w,x) is the probability of label y=1 or -1 given w and x.

This doesn't explain what y is. To use probability, you need a "probability space". The points in the space are "outcomes" of some process. The sets that are assigned probabilities are "events". What random process generates the outcome or event y? Your problem has several y's in it. There are observations [itex]y[1], y[2],...[/itex] and there are predictions [itex]Y[/itex].

Probability from an optimization problem

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Graduate Expected numbers of cards of a last color remaining

Undergrad The problem of points

Graduate Probability puzzle

Undergrad How does axiom of foundation prevent infinite sequence of elements?

Undergrad Understanding permutations and combinations in a coin toss experiment

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect