How can numpy.dot() be used in Python to solve Markov chain Monte Carlo methods?

  • Context: Python 
  • Thread starter Thread starter jbowers9
  • Start date Start date
  • Tags Tags
    Python
Click For Summary

Discussion Overview

The discussion revolves around the use of the numpy.dot() function in Python for implementing Markov chain Monte Carlo methods, particularly in the context of a provided code snippet. Participants explore the mechanics of matrix operations, the structure of the transfer matrix, and the iterative process of updating the position vector.

Discussion Character

  • Homework-related
  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • One participant expresses confusion about the numpy.dot() function and its application in the provided code, particularly regarding the transfer matrix and the position vector updates.
  • Another participant explains that numpy.zeros creates numpy arrays and clarifies the dimensions of the transfer and position arrays, noting that numpy.dot() computes the matrix product.
  • A different participant suggests using the @ operator for matrix multiplication instead of numpy.dot(), citing it as a more modern and cleaner approach in Python 3.4 and above.
  • Some participants discuss the iterative nature of the code, indicating that it applies the transfer matrix multiple times to approach a steady state, referencing the "power method" for convergence in Markov chains.
  • There is mention of the computational efficiency of different methods for matrix operations, with concerns raised about the performance of repeated matrix multiplications versus diagonalization for large powers.
  • One participant notes the importance of the structure of the transition matrix and its implications for convergence rates, mentioning the relevance of eigenvalues in this context.
  • Another participant highlights the potential for direct solutions in certain cases of stochastic matrices, contrasting it with the challenges posed by large, sparse matrices.

Areas of Agreement / Disagreement

Participants generally agree on the mechanics of matrix operations in Python and the iterative approach to finding steady states in Markov chains. However, there are differing opinions on the best practices for implementing these operations and the implications of matrix structure on computational efficiency.

Contextual Notes

Some participants mention limitations related to the computational complexity of matrix operations, particularly in the context of large matrices and the potential for diagonalization to improve efficiency. The discussion also touches on the nuances of convergence rates in Markov chains, which may depend on the specific properties of the transition matrix.

jbowers9
Messages
85
Reaction score
1

Homework Statement



I'm taking an online course in Stat. Mech. and we're working on Markov chain Monte Carlo methods. One of the programs supplied uses the following:
Mentor note: Added code tags
Python:
import numpy
neighbor =  [[1, 3, 0, 0], [2, 4, 0, 1], [2, 5, 1, 2],
             [4, 6, 3, 0], [5, 7, 3, 1], [5, 8, 4, 2],
             [7, 6, 6, 3], [8, 7, 6, 4], [8, 8, 7, 5]]
transfer = numpy.zeros((9, 9))
for k in range(9):
    for neigh in range(4): transfer[neighbor[k][neigh], k] += 0.25
position = numpy.zeros(9)
position[8] = 1.0
for t in range(10):
    print(t,'  ',["%0.5f" % abs(i- 1.0 / 9.0) for i in position])
 *  position = numpy.dot(transfer, position)

I was able to de-construct most of the code by using print statements, but I'm lost on the above statement. I know how to do a dot product but I'm clueless as to how the values in position are being computed after successive iterations of t. I'm a little fuzzy about what the transfer matrix is too - mathematically I think it's the probability matrix for the random walk - but how its datatype, which seems to me to be an array string when I print it, can be used in a dot product gets me head scratching. The [ ] bracket in the print statement also isn't so obvious as i isn't declared/initialized anywhere before its use. I'm new to python, if you haven't guessed, but I have done some C++ programming before. Any help would be greatly appreciated.

2. Homework Equations

πa(p(a→b)+p(a→c)) = πb(p(b→a)) + πc(p(c→a))

The global balance condition is just right.

The Attempt at a Solution


Inserted print statements in the code.
 
Last edited by a moderator:
Technology news on Phys.org
One useful thing to help you understand is:
print(type(variable))
This will print out what type of object a variable is. It might be a string, an integer, a numpy array, and so forth. numpy.zeros creates a numpy array, by default an array of floats. So position is a 9x1 array of floats. You could also print out position.shape, which will tell you the shape of a numpy array. Similarly, transfer is a 9x9 array of floats - in other words a matrix. So numpy.dot just takes the matrix product of the 9x9 array with a 9x1 array, so it will return another 9x1 array .

The python structure [<something with an i> for i in <some list>] is just a shorthand way to loop on a variable i. It is called a "list comprehension". Below is a brief tutorial on how it works. It is very handy but confusing if you are not familiar with it.

http://www.pythonforbeginners.com/basics/list-comprehensions-in-python
 
I recognize that this is someone else's code though a couple things jump out at me.

First thing I'd point out:

If you're using Python ##\geq 3.4##, in general for matrix vector operations (i.e. ##\text{m x n }## arrays and ##\text{n}## dimensional arrays), you don't need to use np.dot(), which always felt clunky to me.

Python:
numpy.dot(transfer, position)

### equivalently, but better:

transfer @ position

If the @ sign is nested all the way to the left of the script, with no spaces or anything else before it on that row, then it is a decorator. But otherwise its used for matrix vector multiplication these days. Note: there are differences between @ operator and .dot() for higher dimensional tensor operations (outside the scope).

- - - -
Second: it seems to buck other conventions -- in particular it is customary to use lower case with vectors and upper case with matrices, so in mathese -- it would be somthing like ##\mathbf p## for position and ##\mathbf T## for transition Transition.

- - - -
Now for your main question, as for the what for loop is actually doing...

Python:
for t in range(10):
    position = numpy.dot(transfer, position)

but better is

Python:
for t in range(10):
    position = Transfer @ position
which reads as
##\mathbf p^{(1)} = \mathbf T \mathbf p^{(0)}##
##\mathbf p^{(2)} = \mathbf T \mathbf p^{(1)}##
##\mathbf p^{(3)} = \mathbf T \mathbf p^{(2)}##
##\mathbf p^{(4)} = \mathbf T \mathbf p^{(3)}##
##\mathbf p^{(5)} = \mathbf T \mathbf p^{(4)}##
##\mathbf p^{(6)} = \mathbf T \mathbf p^{(5)}##
##\mathbf p^{(7)} = \mathbf T \mathbf p^{(6)}##
##\mathbf p^{(8)} = \mathbf T \mathbf p^{(7)}##
##\mathbf p^{(9)} = \mathbf T \mathbf p^{(8)}##
##\mathbf p^{(10)} = \mathbf T \mathbf p^{(9)}## mathematically you can collapse this into one statement:

##\mathbf p^{(10)} = \big(\mathbf T \big(\mathbf T...\big(\mathbf T\big(\mathbf T \mathbf p^{(0)}\big)\big)...\big)\big) = \mathbf T^{10} \mathbf p^{(0)}##

i.e. they are making use of associativity of matrix vector multiplication, and applying your matrix ##\mathbf T## or "Transfer", 10 times in total.

another approach to the same result (i.e. the right hand side of the above) is to use something like

Python:
result = np.matrix_power(Transfer, 10) @ position

except that is wasteful because it does 10 matrix matrix multiplications which are ##\gt O\big(N^2\big)## required for 10 matrix vector operations. (Doing so could also mess up a lot of special features that are common -- e.g. sparse matrices.)
 
  • Like
Likes   Reactions: Greg Bernhardt
Thank you StoneTemple. I will use your suggested coding for dot products and matrix, array nomenclature. As per the code, print(t,' ',["%0.5f" % abs(i- 1.0 / 9.0) for i in position]), the results of the position = Transfer @ position operation are being used to output a matrix, not saved mind you, after t iterations that approaches an equilibrium(?) condition of where the pebble will be after a sufficiently long time. And t.y. "bicycle repairman," aka phyzguy - I'm using PythonAnywhere to run the code :wink:. I finally figured out that the code is 'print(transfer.shape)' not 'print(transfer.shape()).' Needs further investigation...
 
jbowers9 said:
As per the code, print(t,' ',["%0.5f" % abs(i- 1.0 / 9.0) for i in position]), the results of the position = Transfer @ position operation are being used to output a matrix, not saved mind you, after t iterations that approaches an equilibrium(?) condition of where the pebble will be after a sufficiently long time.

yes... they are using what is known as the "power method" of iterating to where the steady state vector is. There's a ton of literature out there particularly on finite state markov chains, which discusses the nuances behind this. Iterative methods, particularly for large sparse matrices, are quite useful in practice.
 
StoneTemplePython said:
except that is wasteful because it does 10 matrix matrix multiplications which are ##\gt O\big(N^2\big)## required for 10 matrix vector operations. (Doing so could also mess up a lot of special features that are common -- e.g. sparse matrices.)
For large powers (e.g. to study to the limit) it could still be interesting. You have the additional overhead of diagonalizing the matrix once, but then you can calculate arbitrary powers in O(n2).
 
mfb said:
For large powers (e.g. to study to the limit) it could still be interesting. You have the additional overhead of diagonalizing the matrix once, but then you can calculate arbitrary powers in O(n2).

diagonalization -- and related Jordan forms-- always lurk nearby here for sure. For an awful lot of markov chain problems (think directed graph of links on the internet), it's not advisable or even feasible because the graphs are massive and extremely sparse.

Other problems' structure and size may very.

There's a ton of literature on estimating / bounding the second largest magnitude eigenvalue as it gives an estimate on the rate of the convergence. I had a bunch more typed on the matter -- the results are extra nice since OP's transition matrix is symmetric -- but I didn't want to go off on too much of a tangent.

- - - -
side note:

For what its worth: at the opposite extreme of sparsity, there is the case that a stochastic matrix has every entry ##\gt 0##.

In this case people can find the steady state solution directly from looking at the nullspace of ##\big(\mathbf T - \mathbf I\big)##, and directly apply a coupling argument to get an easy estimate on the exponential rate of convergence.
 

Similar threads

  • · Replies 4 ·
Replies
4
Views
2K
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 1 ·
Replies
1
Views
5K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 17 ·
Replies
17
Views
4K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K