I Gradient descent, hessian(E(W^T X)) = cov(X),Why mean=/=0?

NotASmurf · Apr 29, 2017

Backpropagation algorithm E(X,W) = 0.5(target - W^T X)^2 as error, the paper I'm reading notes that covarience matrix of inputs is equal to the hessian, it uses that to develop its learning weight update rule V(k+1) = V(k) + D*V(k), slightly modified (not relevant for my question) version of normal backpropagation feedforward gradient descent, but he uses a mean of zero for the inputs, just like in all other neural networks, but doesn't shifting the mean not affect the covariance matrix, so the eigenvectors ,entropy H=|cov(x)|, second derivative is shouldn't change. It's not even a conjugate prior, its not like I am encoding some terrible prior beliefs if we view it as a distribution mapper. Why does the mean being zero matter here and ostensibly in all ANN's, any help appreciated.

MarneMath · Apr 29, 2017

Sounds like the author is performing "whitening" Given some data set, take the eigenbasis divide by the eigenvalues and you get a normalized. If the data is multivariate Gaussian, then the data now as a mean of zero with a covariance matrix equal to the identity. It's a pretty common practice for image processing, unless there's a lot of white noise.

edit: Actually misread the post, ignore me :)

NotASmurf · Apr 29, 2017

First, thanks for responding.
"If the data is multivariate Gaussian" it is a multivariate gaussian normal yes.

"data now as a mean of zero with a covariance matrix equal to the identity." but he also performs a gaussian integral on det(lambda*I - cov(X)), if cov(X) = I then it would have no eigenvalues, which need to be unique for this algorithm to work, and B) lambda*I - cov(X) is used where cov(X) inverse should be used, i know that A-lambda*I has no inverse, I don't know the logic behind him using lambda*I-A,

Also he says he's using something called "standard fresnel representation for the determinate of a symmetric matrix R" but "fresnel representation determinant" doesn't turn up anything that looks "standard" at all, the paper is over 25 years old to be fair. You have any idea what that is? Cos he seems to just be doing some gaussian multivariate optimization.

MarneMath · Apr 29, 2017

Without reading the paper, it's hard to make comments. However, there is a relationship between Fresnel Integral and Multivariate Gaussian, but I'm not well versed enough to say anything meaningful about it. I simply recall in grad school, my friend research random matrix and thesis was over such relationship.

NotASmurf · Apr 29, 2017

"Fresnel integral" Thank you! that's just what i was looking for, the distrubutions derived for the eigenvalues from random matrices looks less arbitrary now. Last thing,

"data now as a mean of zero with a covariance matrix equal to the identity." are you saying that if mean is zero cov(X) is a diagonal with all dimensions having same magnitude?

I Gradient descent, hessian(E(W^T X)) = cov(X),Why mean=/=0?

Thread 'Onto set mapping is the surjective set mapping, and into injective?'

Thread 'Here's a Statistics problem for game of Polo (or Hockey if you like)'

Thread 'Roulette wheel physics and probability'

Similar threads

Hot Threads

B A Little Probability Puzzle

I Need help solving this Existence Algorithm for truth

A Does this computation satisfy LTL formulas?

A Prove that points which are indistinguishable from 0 exist (using logic)

A Mathematical Connection between Cosmic Expansion and Exponential Growth

Recent Insights

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

Insights Relativator (Circular Slide-Rule): Simulated with Desmos - Insight

Insights Fixing Things Which Can Go Wrong With Complex Numbers

Insights Fermat's Last Theorem

Insights Why Vector Spaces Explain The World: A Historical Perspective