Thanks for the help, I appreciate you spending time on trying to figure this out. I ended up figuring out that the pos matrix needed to be translated so the second parameter was CblasTrans and I needed to make tempPos larger and it worked.
all of the arrays i said the dimensions for are in row major or are supposed to be. I can't change the pos array dimensions or the rest of the program will break. It's written by someone else as a simulation and is pretty long, I'm just adding averages and a couple other things to it.
I can...
I understand what the parameters mean. I just don't see what I'm doing wrong. I have checked all of the meanings multiple times and the values I'm passing.
the mass matrix is n (303) x 1
pos is n(303) x NDIM (3)
tempPos is NDIM x 1
tempMass is nlocal(300) x 1
mass_avg is 1x1
pos_avg is...
I'm having problems running cblas_dgemm on a matix matrix multiplication.
I have the following matricies
double * mass = new double[n];
double (* pos)[NDIM] = new double[n][NDIM];
double tempPos[NDIM];
double tempMass[nlocal];
double mass_avg[1];
double pos_avg[NDIM]...
Yeah... I ended up doing that.
Lets say block 0,0 of A includes a00...a09 and a00..a99 (the whole 10x10 square)
block 0,0 of B includes b00...b09 and b00...99
same for block c..
so basically the r,c notation represents each 10x10 block in a matrix
For block 0,0 in a * block 0,0 in b...
oh! That makes so much more sense! Thank you!
You assumed correctly, writes do not affect the data cache.
I'm pretty sure I figured this one out but now I'm having issues with part b which involves blocking.
(b). If matrices are partitioned into block matrices with each block being a 10...
Homework Statement
Suppose your data cache has 30 lines and each line can hold 10 doubles. You are performing a matrix-matrix multiplication (C=C+A*B) with square matrices of size 1000 and 10 respectively. Assume data caches are only used to cache matrix elements which are doubles. The cache...
Homework Statement
a) If the new floating-point speeds up floating-point operations by, on average, 2x, the floating-point operations take 20% of the original program execution time, what is the overall speedup(ignoring the penalty to any other instruction)?
b) Now assume that speeding up...
Homework Statement
Server farms such as Google and Yahoo! provide enough compute capacity for the highest request rate of the day. Imagine that most of the time these servers operate at only 60% capacity. Assume further that the power does not scale linearly with the load; that is, when the...
I still don't understand it. I need someone to literally walk me through it so that I understand each step. I get what the algorithm is for but I don't understand that example I posted.
So the inverse is 67 in this case?
Here's the definition I have:
Extended Euclidean algorithm
Takes a and b
Computes r, s, t such that
r=gcd(a, b) and, sa + tb = r
(only the last two terms in each of these sequences at any point in the algorithm)
Corollary. Suppose gcd(r0, r1)=1. Then
r_1-1 mod r_0=t_m mod r_0.
The...
Homework Statement
Prove that x^(k(p–1)+1) mod p = x mod p for all primes p and integer k ≥ 0.
Hint: Use Fermat’s Little theorem and induction on k.
Homework Equations
I understand that fermat's little theorm is:
Let p be prime, and b e Z_p. Then b^p = b (mod p).
The Attempt at...