
#1
Nov2612, 01:56 PM

P: 304

I'm doing a project based around the Metropolis algorithm to explore parameter spaces.The computational aspect I can understand and implement. I understand that a Markov Chain is constructed in the parameter space. However, I don't understand the theory behind why it works so well in exploring it. Being a Physics student, most of the statistical literature I have read has not helped me. Can anyone explain it to me, or point to basic references?
Thanks. 



#2
Nov2812, 01:40 AM

Sci Advisor
P: 3,178





#3
Nov2812, 05:46 AM

P: 304

If I'm not mistaken, the stationary distribution X with transition matrix P is given by
[tex]XP=X[/tex] Is that correct? 



#4
Nov2812, 07:43 AM

Mentor
P: 14,483

Metropolis Algorithm  Why does it work?
Correct. I'll let Stephen take you further down this road.
You also implied that you would like some physical analog to help you understand this. A good physical analog lies in the annealing of ferrous metals. Simulated annealing emulates this. The MetropolisHastings algorithm is at the heart of simulated annealing. Simulated annealing is essentially Markov Chain Monte Carlo by another name. 



#5
Nov2812, 11:17 AM

Sci Advisor
P: 3,178

A metaphor for teaching Markov chains involves frogs and lily pads. When a frog is on a particular lily pad at time t = n, he has a probability of staying on the pad at time t = n+1 and other probabilities of jumping to various other lily pads at that time. The probabilities depend only on which lily pad the frog is currently sitting. Instead of one frog imagine a hoard of tiny frogs that obey this process. If you begin at time t = 0 by tossing frogs on lily pads at random and let them all hop independently then there the population of frogs on each lily pad might roughly stabilize after a long time. Intuitively, the observed fraction of frogs on each pad would approximate the stationary distribution of the process. The is believable if the lily pads are arranged so a frog on one can eventually hop his way to any other pad, perhaps indirectly.  i.e. we aren't talking about a set of lily pads containing pads from two different ponds. It may be stretch to apply this metaphor to the Metropolis algorithm, but let's try. In that situation, there is some unknown probability distribution for frogs on pads. This doesn't involve letting them jump. Perhaps someone has created this distribution by placing them on the pads. Furthermore the pond is too large for you to see all the lily pads at once. Your job is to define a set of jump probabilities for each pad so that when the frogs are allowed to jump according to that Markov process, their distribution will approximately preserve the original distribution of the frogs. (i.e. the original distribution of the frogs will be the stationary distribution of the process.) You can count the number of frogs on a given pad P_a and you can count the number of frogs on the pads P_b,P_c,...around it, to which the frogs may jump. But you don't know the total number of frogs in the whole pond. How will you determine a set of jump probabilities involving pad P_a? It may or may be intutitive that you can determine the probabilities by computations using ratios of frogs , such as (number of frogs on pad P_b)/ (number of frogs on pad P_a) without knowing the original distribution of frogs (e.g. not knowing (number of frogs on pad P_b)/ (total number of frogs in the pond) ). If this is intuitive to you then, since you're doing physics, it may be sufficient explanation! If not, we must do some algebra. 



#6
Nov2812, 04:55 PM

P: 304

Your explanation makes some sense to me, but I think I need more time for it to let it sink. But at the same time I'm looking to understand it more from a mathematical point of view. I'm not sure where the proposal pdf enters the discussion here. Are you saying that the equilibrium distribution of the frogs is equivalent to the distribution we were wanting to explore?




#7
Nov2812, 06:23 PM

Sci Advisor
P: 3,178





#8
Nov2912, 12:20 PM

P: 304

I'm having some trouble thinking to this analogy in terms of a 1D problem. At each step we are just jumping to another point based on a ratio. In this analogy is there just one frog?




#9
Nov2912, 06:28 PM

Sci Advisor
P: 3,178

I prefer to think of a hoard of frogs, each acting independently. You can think of the process being executed by one frog. If you think of one frog then you have to think getting empirical evidence about probabilities of jumping from pad P_a to pad P_b by sitting and waiting for him to land on pad P_a and then observing what fraction of his total jumps from pad P_a are to pad P_b. I don't like to wait.




#10
Nov2912, 06:48 PM

P: 304

Ok, this is starting to make more sense now. Can you tell me how the factor to accept or reject a jump
[tex]\alpha=min \left( \frac{P^{(c)}}{P^{(o)}},1 \right) [/tex] enters the analogy? (P^{c} is the probability of the candidate point being considered and P^{0} is the previous point in the chain) 



#11
Nov2912, 07:38 PM

Sci Advisor
P: 3,178

I don't know if we can make a "perfect" explanation of the acceptance ratio purely from intuition. Let's do the best we can. Let's take it for granted that we are on pad [itex] p_o[/itex] and we must make a yesorno decision about whether to jump to pad [itex] p_c [/itex]. All we know about the target distribution is a ratio of probabilities. If the probability of a frog being on pad [itex] p_c [/itex] is, for example, 5 times as great as the probability of a frog being on pad [itex] p_o [/itex] then it makes sense for us (as a frog) to jump to pad [itex] p_c [/itex]. This decision to jump doesn't directly implement the factor of 5. But given we must make a yesorno decision about jumping, its the only good choice. Suppose we know that the probability of a frog being on pad [itex] p_c [/itex] on only 1/3 the probability of a frog being on pad [itex] p_c [/itex]. We can attempt to implement the ratio of 1/3 by setting the probability of jumping from [itex] p_o [/itex] to pad [itex] p_c [/itex] to be 1/3 and then make a random draw to determine whether to jump. (We didn't have such a choice in the previous situation when the ratio was 5 since we aren't allowing ourselves to transform from one frog on pad [itex] p_o [/itex] into 5 frogs on pad [itex] p_c [/itex].) 



#12
Dec112, 03:02 AM

P: 523

The acceptreject factor would have been chosen to satisfy "local balance" so that the markov chain is reversible, i.e. that the number of frogs jumping from pad 1 to pad 2 matches the number of frogs jumping from pad 2 to pad 1 (which is a slightly stronger condition than existence of an equilibrium).
The offdiagonal terms of the transition matrix will look like a(1,2)*q(1,2) where a is the (as yet unknown) acceptance factor and q is the jump probability, so the local balance requirement says p(1)*a(1,2)*q(1,2) = p(2)*a(2,1)*q(2,1). Metropolis assumes q is symmetric, so if p(1)<p(2) we can set a(1,2)=1 and a(2,1)=p(2)/p(1). HTH 



#13
Dec112, 05:22 PM

P: 304

Thanks for the detailed explanation both of you. However, from what I am gathering it seems that the algorithm should give us data for how much more likely it is to be at a point as compared to a "reference" point. But using the algorithm computationally on a toy problem reproduces the probability distribution correctly (i.e. the data points lie along the curve of the distribution). How is that the case?




#14
Dec112, 07:52 PM

P: 523

A neat example to try is points uniformly distributed on the arc of an ellipse, using a spreadsheet and never needing to calculate an elliptic integral. 



#15
Dec412, 12:56 PM

P: 304

Can you tell me why the balance equation being satisfied implies a stationary distribution exists for a chain?




#16
Dec412, 08:23 PM

Sci Advisor
P: 3,178

I don't know how far the frogsonlilypads metaphor should be pushed, but it suits my level of thinking better than respectable arguments  hence:
Imagine a hoard of independently acting frogs on the lilly pads. Suppose the current fraction of frogs on pad A is given by the function X(A). We let all the frogs execute the markov process. If the function X has "detailed balance" then , for each pair of pads A,B, the number of frogs that jumps from pad A to B will be about the same as the number of frogs that jump from pad B to pad A, so we will end up with roughly the same distribution of frogs. Shamelessly confusing the ideas of frequencies and probabilities, let N = the total number of frogs. The number that jump from pad A to pad B is about N X(A) pr(A>B) The number that jump from pad B to pad A is about N X(B) pr(B>A). So we must have X(A) pr(A>B) = X(B) Pr(B>A). 



#17
Dec512, 11:15 AM

P: 304

Great, that makes sense!
So is it the accept reject factor I mentioned earlier than ensures that there is a stationary distribution for our chain? 



#18
Dec512, 07:17 PM

Sci Advisor
P: 3,178

X(A) Pr(A>B) = X(B) Pr(B>A) Suppose X(A) < X(B) The Metropolis algorithm defines Pr(A>B) = 1 and Pr(B>A) = X(A)/X(B) Those probabilities change above equation into an identity: X(A) (1) = X(B) ( X(A)/X(B)) 


Register to reply 
Related Discussions  
Metropolis Algorithm and integration volume  General Physics  1  
MetropolisHastings algorithm  Set Theory, Logic, Probability, Statistics  1  
Using metropolis algorithm for 2D ising model  Set Theory, Logic, Probability, Statistics  0  
MetropolisMonte Carlo Algorithm  Atomic, Solid State, Comp. Physics  2  
MetropolisHastings algorithm  Set Theory, Logic, Probability, Statistics  2 