# Inverse gamma distribution & simulated annealing problem

1. Dec 24, 2007

### tbishop

I've hit a problem trying to sample an inverse gamma distribution, 'scaled' using a temperature variable, T. If my distribution is defined as (where the normalising constant k=(b^a)/Gamma(a) ):
IG(x|a,b) = k * x^(-a-1) * exp(-b/x)
then the scaled version is
(IG(x|a,b))^(1/T) = k^(1/T) * x^(-(-a+1)/T) * exp(-b/(Tx)).
which I want to sample as part of a simulated annealing procedure.

If I am not mistaken, this is also proportional to a new IG distribution, IG(x|a',b')
where a'+1 = (a+1)/T and b'=b/T. Hence a' = (a+1-T)/T.
The problem is however, that a' and b' should be strictly > 0 for the distribution to be valid. Thus for temperatures T>a+1, a' becomes negative and the samples can't be drawn.
(However I can still evaluate the IG pdf for a'<0... so I'm not sure
why the condition is strictly necessary? Is it just a physical
interpretation?)

I can do the sampling for X~IG(a',b') by transformation of a Gamma variate
G(x|a,b)=k * x^(a-1) exp(-bx)
by drawing Y~G(a',b') and letting X=1/Y.

Now the strange thing is if I apply the same temperature scaling to the Gamma distribution, I get the new transformation to a Gamma distribution with
a' = (a-1+T)/T
which will always be positive for all T>=1 if a>0. As whether I'm working with x or 1/x (and therefore working with the IG or Gamma) should be purely a matter of convenience, and not depend on which definition I start from, there is something at odds here...

I've probably missed something obvious but I can't think what it is. Any suggestions appreciated!

2. Dec 25, 2007

### EnumaElish

I am somewhat unclear as to why you'd like to scale the pdf in the way you described. The usual approach is to scale the random variable itself, say Y = sX for a scaling constant s > 0; and if X ~ Gamma(a, b) then Y ~ Gamma(a, sb).

In the usual meaning of "scaling," parameter a is not affected because it is not a scaling parameter -- parameter b is.

3. Dec 25, 2007

### tbishop

I'm not referring to this normal type of "scaling", I'm referring to raising the whole distribution to a power 1/T which effectively "broadens" the distribution. In the case of a Gaussian, the effect is indeed simply to increase the variance. The reason I want to do this is to increase convergence rates in MCMC, by searching the space faster using http://mathworld.wolfram.com/SimulatedAnnealing.html" [Broken])

Last edited by a moderator: May 3, 2017