
#1
Mar1112, 11:00 PM

P: 194

1. The problem statement, all variables and given/known data
Suppose your data cache has 30 lines and each line can hold 10 doubles. You are performing a matrixmatrix multiplication (C=C+A*B) with square matrices of size 1000 and 10 respectively. Assume data caches are only used to cache matrix elements which are doubles. The cache replacement rule is least recently used first. Onedimensional arrays are used to represent matrices with the row major order. (a). When matrixmatrix multiplication is performed using the simple tripleloop algorithm, there are 6 versions of the algorithm (ijk, ikj, jik, jki, kij, kji). Calculate the number of cache misses for each element in each matrix for each version of the algorithm when the sizes of the matrices are 1000 and 10 respectively. 2. Relevant equations N/A 3. The attempt at a solution I'm confused about the actual matrix multiplication. From the problem I'm getting matrix A: 1000x1000 and matrix b: 10x10. How can these two be multiplied? The number of columns and rows do not match up between them. Am I misreading the problem? After the above is clarified I understand that not just one element of the matrix is transferred to the cache but how many are? since matrix A is 1000x1000 is the whole first row a0,0 to a0, 999 transfered into the cache when a0,0 is brought in? Or are just 10 element blocks brought into the cache? If only 10 are brought in at a time are they brought in sequentially or is part of matrix b brought in too? Sorry for the overload of questions, I just don't understand the problem and can't progress without understanding it. 



#2
Mar1212, 03:31 AM

HW Helper
P: 6,931

The order of memory operations will depend on the algorithm to multiply A by B. Normally a row of A is multiplied with a column of B, and the products summed and stored in some output matrix, but for this problem I assume that writes will not affect the data cache. 



#3
Mar1512, 12:34 AM

P: 194

oh! That makes so much more sense! Thank you!
You assumed correctly, writes do not affect the data cache. I'm pretty sure I figured this one out but now i'm having issues with part b which involves blocking. (b). If matrices are partitioned into block matrices with each block being a 10 by 10 matrix, then the matrixmatrix multiplication can be performed using one of the 6 blocked version algorithms (ijk, ikj, jik, jki, kij, kji). Assume the multiplication of two blocks in the inner three loops uses the same loop order as the three outer loops in the blocked version algorithms. Calculate the number of cache misses for each element in each matrix for each version of the blocked algorithm when the size of the matrices is 1000. I'm starting with ijk. Which is the algorithm:
0 c9,0…c9,9 1 a9,0…a9,9 2 b9,0…b9,9 3 b8,0…b8,9 4 b7,0…b7,9 5 b6,0…b6,9 6 b5,0…b5,9 7 b4,0…b4,9 8 b3,0…b3,9 9 b2,0…b2,9 10 b1,0…b1,9 11 b0,0…b0,9 12 c8,0…c8,9 13 a8,0…a8,9 14 c7,0…c7,9 15 a7,0…a7,9 16 c6,0…c6,9 17 a6,0…a6,9 18 c5,0…c5,9 19 a5,0…a5,9 20 c4,0…c4,9 21 a4,0…a4,9 22 c3,0…c3,9 23 a3,0…a3,9 24 c2,0…c2,9 25 a2,0…a2,9 26 c1,0…c1,9 27 a1,0…a1,9 28 c0,0…c0,9 29 a0,0…a0,9 Now, I'm having a hard time figuring out how many cache misses it will have for all subsequent blocks. I know there will probably be some values in cache that can be accessed without being pulled in again but I can't seem to figure out the pattern. Hopefully I at least have it right up to this point. 



#4
Mar1512, 02:07 AM

HW Helper
P: 6,931

Calculating cache misses 



#5
Mar1512, 11:29 PM

P: 194

Yeah... I ended up doing that.
Lets say block 0,0 of A includes a00...a09 and a00..a99 (the whole 10x10 square) block 0,0 of B includes b00...b09 and b00...99 same for block c.. so basically the r,c notation represents each 10x10 block in a matrix For block 0,0 in a * block 0,0 in b stored in block 0,0 in c: all three of the blocks need to be loaded in so it is a cache miss for the 0th column of each For block 0,0 in a * block 1,0 in b stored in block 0,0 in c: each row of b is loaded so there is a cache miss for each value in the 0th column the rows a00...a09 and c00...c09 are in cache and are bumped up to more recently used (for each multiplication with a block of b) as lines of b are brought in, the other rows of matricies a and c are pushed out of cache so there is a cache miss for each of those when they are loaded in again (the 0th column again) I kind of understand what's going on and see a pattern but I'm having a really hard time coming up with actual values for the number of cache misses in the pattern for the ijk algorithm. 


Register to reply 
Related Discussions  
Do microcontrollers have cache?  Electrical Engineering  4  
Contestant Misses First Quesiton  General Discussion  18  
Two trains leave a station, and an observer misses them both what does he see  Introductory Physics Homework  1 