Why Does My LBM D2Q9 Implementation in CUDA C Not Conserve Density?

  • Thread starter csp256
  • Start date
In summary, the conversation is about a self-directed project in Cuda C focused on the lattice Boltzmann method for computational fluid dynamics. The attempt at a solution has not been successful in conserving total particle density and the code is based on a reference implementation. The implementation differs in the use of matrices and coordinate system. The user is seeking help with debugging and references for the LBM and computational physics techniques.
  • #1
csp256
2
0

Homework Statement



I am doing a self directed project in Cuda C for an undergraduate parallel computing class. I chose to focus on the lattice Boltzmann method (for computational fluid dynamics). I have not yet had a class on statistical mechanics, nor any other formal exposure to the physics at hand.

My attempt does not seem conserve the total particle density (it trends to 0 pretty quickly).

Homework Equations



I am primarily using the code on this page as my guide: http://www.exolete.com/code/lbm

It is D2Q9 using BGK. It worked on my Matlab instal without any tinkering on my part.

I think I understand the code, though I do not know the math used to get the equations for the FEQ matrix.

The Attempt at a Solution



I tried doing a straight forward implementation of the MATLAB code in Cuda C. I am not to the point of caring about performance: I am very well aware that some parts of the code are painfully slow.

I did do a couple things differently from the reference implementation. My matrices are all flat arrays, row-order. Specifically the F and BOUNCE arrays are 3 dimensional flat arrays, which are row-then-column-order. Each one of the densities at each cell is therefor stored with offset direction*(width of matrix * height of matrix)+x+(width of matrix)*y. Direction is 0 through 9, as in the reference, but because Cuda C is 0 indexed (unlike Matlab) the "center" cell is at direction 0 (instead of 9). The other directions are the same.

Also, in my implementation I used a left handed coordinate system (origin is upper left, y-axis increases downwards). For some reason I thought that was what the reference was using (whoops!). I *think* all I need to do to counter this is to flip some negative signs on the y direction. Even if I did mess something up here, I don't think it would account for the problem I am seeing.

Note also that my UX and UY matrices are computed on the host (CPU), not the device (GPU). They are just to help in debugging and aren't directly used in computation.

I have hard-coded the simulation size to 10 by 10 with two obstacles, one at (4,3) and the other at (6,6) [0 indexed]. Unlike the reference implementation I zeroed out those cells before the first propagation step. This is definitely not responsible for the discrepancy I am seeing (the per cell density decreases everywhere after the first step, not just on cells near the obstacles).

I would really appreciate it if anyone could take a look at this. My teacher can not help me (he is a physicist) and I suspect that whatever I did wrong is a fairly obvious conceptual error that I can't recognize because of my lack of experience.

Also, any (free) references on the LBM (or other practical computational physics techniques) that are suitable for a beginner would be greatly appreciated!

Code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>
// #include <timer.h>
#include <project_main.h>
#define density 1.0
#define nx 10
#define ny 10
#define msize nx*ny
//#define numberOfActiveNodes msize-2 // see the comments in BOUND's declaration for a justification.
//#define deltaU 0.00001

#define cudaCheckErrors(msg) \
   do { \
      cudaError_t __err = cudaGetLastError();\
      if (__err != cudaSuccess) { \
         printf("Fatal error %s (%s at %s:%d)\n", msg, cudaGetErrorString(__err), __FILE__, __LINE__); \
         exit(1); \
      } \
   } while (0)

int main(int argc, char **argv) {
	int dev_id = 0;
	if (DeviceSelect(dev_id) < 0) {
		fprintf(stderr, "Err: No GPU device found\n");
		exit(1);
	}
	//DeviceInfo(dev_id);

	// The UX, UY, and DENSITY matrices are currently only used for debugging. 
        float *UX = (float *) malloc(msize * sizeof(float));
        float *UY = (float *) malloc(msize * sizeof(float));
        float *DENSITY = (float *) malloc(msize * sizeof(float));

	// The F matrix is 3 dimensional. (really it is a flat array). The first two dimensions are the x and y coordinates.
	// The third is 9 long, and details the densities moving in each of 9 directions.
	// 0 is "center", 1 is right, 2 is upper right, 3 is up, etc.
	float *F = NULL;
	float *BOUNCE = NULL;  
	//float *TEMP = NULL;
	int F_size = 9 * msize * sizeof(float);
	F = (float *) malloc(F_size);
	BOUNCE = (float *) malloc(F_size);
	for (int i=0; i<9*msize; i++) {
		F[i] = density/9.0;
		BOUNCE[i] = 0; // take care that this continues to be initialized to 0.
	}

	bool *OBSTACLES = NULL;
	OBSTACLES = (bool *) malloc(msize*sizeof(bool));
	for (int i=0; i<msize; i++) {
		// for debugging purposes it is easier to see how things are working if we have SOME obstruction.
		// we define the two obstructions as being at (3,5) and (7,7)... in the MATLAB code.
		// BUT we are 0 indexed here so the obstructions are at (2,4) and (6,6). 
		// This is hardcoded solely for debugging.
		OBSTACLES[i] = false;
		if (i == 4+2*nx || i == 6+6*nx) {
			OBSTACLES[i] = true;
			for (int dir=0; dir<9; ++dir) {
				F[i+dir*msize] = 0.0;
			}
		}
	} 

	float *F_d, *BOUNCE_d, *TEMP_d;
	bool *OBSTACLES_d;
	cudaMalloc(&F_d, F_size);
	cudaCheckErrors("allocating F_d");
	cudaMalloc(&BOUNCE_d, F_size);
	cudaCheckErrors("allocating BOUNCE_d");
	cudaMalloc(&OBSTACLES_d, msize*sizeof(bool));
	cudaCheckErrors("allocating OBSTACLES_d");

	cudaMemcpy(F_d, F, F_size, cudaMemcpyHostToDevice);
	cudaCheckErrors("copying F to device");
	cudaMemcpy(BOUNCE_d, BOUNCE, F_size, cudaMemcpyHostToDevice);
	cudaCheckErrors("copying BOUNCE to device");	
	cudaMemcpy(OBSTACLES_d, OBSTACLES, msize*sizeof(bool), cudaMemcpyHostToDevice);
	cudaCheckErrors("copying OBSTACLES to device");        for (int i=0; i<9; ++i) {
		printf("Printing F in direction %d\n", i);
                printMatrix(F, i);
        }

	createMatrices(F,UX,UY,DENSITY);

	for (int i=0; i<10; ++i) {
                printf("\n###########\n");
                printf("###########\n");
		printf("###########\n");
        	printf("###########  Main loop number: %d\n", (i+1));
        	printf("###########\n");
                printf("###########\n");
                printf("###########\n");

// ###############################################################################################################################
		printf("\n&&& Launching kernels.\n");
		LBM_D2Q9(F_d, BOUNCE_d, OBSTACLES_d, 1); 
// ###############################################################################################################################

		printf("\nSwapping the F_d and BOUNCE_d pointers in main.\n");
		TEMP_d = F_d;
		F_d = BOUNCE_d;
		BOUNCE_d = TEMP_d;

        	cudaMemcpy(F, F_d, F_size, cudaMemcpyDeviceToHost);
        	cudaCheckErrors("copying F_d to host");
		cudaMemcpy(BOUNCE, BOUNCE_d, F_size, cudaMemcpyDeviceToHost);
		cudaCheckErrors("copying BOUNCE_d to host");

		printf("\nThis is the F matrix:\n");
		for (int i=0; i<9; ++i) {
			printf("Printing F in direction %d\n", i);
			printMatrix(F, i);
		}

		printf("\nThis is the BOUNCE matrix:\n");
                for (int i=0; i<9; ++i) {
			printf("Printing BOUNCE in direction %d\n", i);
                        printMatrix(BOUNCE, i);
                }

		printf("\n\n\n\nMaking UX, UY, and DENSITY matrices from F...\n");
		createMatrices(F, UX, UY, DENSITY);
		printMatrix(UX, 0);
		printMatrix(UY, 0);
		printMatrix(DENSITY, 0);

	}

	cudaFree(F_d);
	cudaCheckErrors("freeing F_d");
	cudaFree(BOUNCE_d);
	cudaCheckErrors("freeing BOUNCE_d");
	cudaFree(OBSTACLES_d);
	cudaCheckErrors("freeing OBSTACLES_d");

	printf("\n\n");
//	drawMatrix();
}

void printMatrix(float *F, int direction) {
	printf("Printing some matrix in direction %d... \n", direction);
	for (int j=0; j<ny; j++) {
		for (int i=0; i<nx; i++) {
			printf("%f ", F[direction*msize + i+j*nx]);
		}
		printf("\n");
	}
	printf("\n");
}

void createMatrices(float *F, float *UX, float *UY, float *DENSITY) {
	float totalDensity = 0;

	for (int i=0; i<msize; ++i) {
		DENSITY[i] = 0;
	}

	for (int dir=0; dir<9; ++dir) {
		for (int i=0; i<msize; ++i) {
			//totalDensity += F[dir*msize + i];
			DENSITY[i] += F[dir*msize + i];
			if (dir == 8 | dir == 1 | dir == 2) {
				UX[i] += F[dir*msize + i];
			} else if (dir == 4 | dir == 5 | dir == 6) {
				UX[i] -= F[dir*msize + i];
			}
			
			if (dir == 2 | dir == 3 | dir == 4) {
				UY[i] -= F[dir*msize + i];
			} else if (dir == 6 | dir == 7 | dir == 8) {
				UY[i] += F[dir*msize + i];
			}
		}
	} 

	totalDensity = 0.0; // less floating point errors this way
	for (int i=0; i<msize; ++i) {
		totalDensity += DENSITY[i];
	}
	printf("Total density is %f\n", totalDensity);
}

Code:
#include <stdio.h>
#include <project_main.h>

#define omega 1.0

#define wrapRight(x) if (nx <= x) x = 0;
#define wrapLeft(x) if (x < 0) x = nx - 1;
#define wrapUp(y) if (y < 0) y = ny - 1;
#define wrapDown(y) if (ny <= y) y = 0;

// pass them F_old and BOUNCE, then swap the pointers after the function call
__global__ void LBM_D2Q9_Kernel_Propogate(float *F, float *F_new) {
	int x = blockIdx.x*blockDim.x + threadIdx.x;
        int y = blockIdx.y*blockDim.y + threadIdx.y;
	if (nx <= x || ny <= y) return;

	int x_target, y_target;

	// Forgive me, for I have sinned...
	// right - 1
	x_target = x+1;
	wrapRight(x_target);
	y_target = y;
	F_new[msize*1 + y_target*nx + x_target] += F[msize*1 + y*nx + x]; // indexing gets tricky here; watch yourself!

	// upper right - 2
	x_target = x+1;
	wrapRight(x_target);
	y_target = y-1; // we are using a LEFT handed coordinate system, going to upper right is decreasing y!
	wrapUp(y_target);
	F_new[msize*2 + y_target*nx + x_target] += F[msize*2 + y*nx + x];

	// up - 3
	x_target = x;
	y_target = y-1;
	wrapUp(y_target);
	F_new[msize*3 + y_target*nx + x_target] += F[msize*3 + y*nx + x];

	// upper left - 4
        x_target = x-1;
	wrapLeft(x_target);
        y_target = y-1;
	wrapUp(y_target);
        F_new[msize*4 + y_target*nx + x_target] += F[msize*4 + y*nx + x];

	// left - 5
        x_target = x-1;
	wrapLeft(x_target); 
        y_target = y;
        F_new[msize*5 + y_target*nx + x_target] += F[msize*5 + y*nx + x];

	// bottom left - 6
        x_target = x-1;
        wrapLeft(x_target);
        y_target = y+1;
	wrapDown(y_target);
        F_new[msize*6 + y_target*nx + x_target] += F[msize*6 + y*nx + x];

	// down - 7
        x_target = x;
        y_target = y+1;
        wrapDown(y_target);
        F_new[msize*7 + y_target*nx + x_target] += F[msize*7 + y*nx + x];

	// bottom right - 8
        x_target = x+1;
	wrapRight(x_target);
        y_target = y+1;
        wrapDown(y_target);
        F_new[msize*8 + y_target*nx + x_target] += F[msize*8 + y*nx + x];
	
	// *weeps openly*

	F[y*nx + x] = 0.0;
        F[msize + y*nx + x] = 0.0;
        F[2*msize + y*nx + x] = 0.0;
        F[3*msize + y*nx + x] = 0.0;
        F[4*msize + y*nx + x] = 0.0;
        F[5*msize + y*nx + x] = 0.0;
        F[6*msize + y*nx + x] = 0.0;
        F[7*msize + y*nx + x] = 0.0;
        F[8*msize + y*nx + x] = 0.0;
}
__global__ void LBM_D2Q9_Kernel_Collide(float *F, bool *OBSTACLE, float *BOUNCE) {
        int x = blockIdx.x*blockDim.x + threadIdx.x;
        int y = blockIdx.y*blockDim.y + threadIdx.y;
        if (nx <= x || ny <= y) return;

	if (OBSTACLE[y*nx + x]) {
		int target_x, target_y;

		// write densities to bounce back...
		// (reverse density direction, and move it in that direction)
		// right - 1
		target_x = x-1;
		wrapLeft(target_x);
		target_y = y;
		BOUNCE[5*msize + target_x+target_y*nx] = F[1*msize + y*nx + x];

		// upper right - 2
                target_x = x-1;
                wrapLeft(target_x);
                target_y = y+1;
		wrapDown(target_y);
                BOUNCE[6*msize + target_x+target_y*nx] = F[2*msize + y*nx + x];

		// up - 3
		target_x = x;
		target_y = y+1;
		wrapDown(target_y);
                BOUNCE[7*msize + target_x+target_y*nx] = F[3*msize + y*nx + x];

		// upper left - 4
		target_x = x+1;
		wrapRight(target_x);
		target_y = y+1;
		wrapDown(target_y);
                BOUNCE[8*msize + target_x+target_y*nx] = F[4*msize + y*nx + x];

		// left - 5	
		target_x = x+1;
		wrapRight(target_x);
		target_y = y;
                BOUNCE[msize + target_x+target_y*nx] = F[5*msize + y*nx + x];

		// bottom left - 6
		target_x = x+1;
		wrapRight(target_x);
		target_y = y-1;
		wrapUp(target_y);
                BOUNCE[2*msize + target_x+target_y*nx] = F[6*msize + y*nx + x];

		// down - 7
		target_x = x;
		target_y = y-1;
		wrapUp(target_y);
                BOUNCE[3*msize + target_x+target_y*nx] = F[7*msize + y*nx + x];

		// bottom right - 8
		target_x = x-1;
		wrapLeft(target_x);
		target_y = y-1;
		wrapUp(target_y);
                BOUNCE[4*msize + target_x+target_y*nx] = F[8*msize + y*nx + x];

		// delete the densities at this cell (they have bounced)
		F[x+y*nx] = 0.0;
		F[msize + y*nx + x] = 0;
		F[2*msize + y*nx + x] = 0;
                F[3*msize + y*nx + x] = 0;
                F[4*msize + y*nx + x] = 0;
                F[5*msize + y*nx + x] = 0;
                F[6*msize + y*nx + x] = 0;
                F[7*msize + y*nx + x] = 0;
                F[8*msize + y*nx + x] = 0;
	} else {
		// Calculate local density
		float UX, UY, temp;
		float DENSITY = F[y*nx + x];

		// right - 1
		temp = F[msize + y*nx + x];
		DENSITY += temp;
		UX = temp;

		// upper right - 2
       		temp = F[2*msize + y*nx + x];
        	DENSITY += temp;
        	UX += temp;
		UY = -temp;

		// up - 3
		temp = F[3*msize + y*nx + x];
        	DENSITY += temp;
		UY -= temp;

		// upper left - 4
        	temp = F[4*msize + y*nx + x];
        	DENSITY += temp;
		UX -= temp;
        	UY -= temp;
	
		// left - 5
        	temp = F[5*msize + y*nx + x];
        	DENSITY += temp;
        	UX -= temp;

		// bottom left - 6
        	temp = F[6*msize + y*nx + x];
        	DENSITY += temp;
        	UX -= temp;
		UY += temp;

		// bottom - 7
        	temp = F[7*msize + y*nx + x];
        	DENSITY += temp;
        	UY += temp;

		// bottom right - 8
        	temp = F[8*msize + y*nx + x];
        	DENSITY += temp;
        	UX += temp;
		UY += temp;

		// normalize the pressures
		UX /= DENSITY;
		UY /= DENSITY;

		// adjust inlet pressure
		if (x == 0) UX += deltaU;

		// calculate temp variables, U_SQU, U_C2, U_C4, U_C6, U_C8
		float U_SQU = UX*UX + UY*UY;

		// calculate equilibrium distribution for stationary particles
		F[y*nx + x] = omega*(t1*DENSITY*(1-U_SQU/(2*c_squ)))  +  (1-omega)*F[y*nx + x];	
	
		// ... nearest neighbors
		F[msize + y*nx + x] = omega*(t2*DENSITY*(1+UX/c_squ + 0.5*pow((UX/c_squ),2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[msize + y*nx + x];
		F[3*msize + y*nx + x] = omega*(t2*DENSITY*(1-UY/c_squ + 0.5*pow((UY/c_squ),2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[3*msize + y*nx + x];
		F[5*msize + y*nx + x] = omega*(t2*DENSITY*(1-UX/c_squ + 0.5*pow((UX/c_squ),2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[5*msize + y*nx + x];
		F[7*msize + y*nx + x] = omega*(t2*DENSITY*(1+UY/c_squ + 0.5*pow((UY/c_squ),2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[7*msize + y*nx + x];

		// ... next-nearest neighbors
		F[2*msize + y*nx + x] = omega*(t3*DENSITY*(1+(UX-UY)/c_squ + 0.5*pow((UX-UY)/c_squ,2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[2*msize + y*nx + x];
                F[4*msize + y*nx + x] = omega*(t3*DENSITY*(1+(-UX-UY)/c_squ + 0.5*pow((-UX-UY)/c_squ,2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[4*msize + y*nx + x];
                F[6*msize + y*nx + x] = omega*(t3*DENSITY*(1+(-UX+UY)/c_squ + 0.5*pow((-UX+UY)/c_squ,2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[6*msize + y*nx + x];
                F[8*msize + y*nx + x] = omega*(t3*DENSITY*(1+(UX+UY)/c_squ + 0.5*pow((UX+UY)/c_squ,2.0) - U_SQU/(2*c_squ) ))  +  (1-omega)*F[8*msize + y*nx + x];

		// Relax to equilibrium done inline, above.
	}
	// "Look on my works, ye Mighty, and despair!"
}void LBM_D2Q9(float *F_d, float *BOUNCE_d, bool *OBSTACLES_d, int iterations) {
	printf("Iterations: %d\n\n", iterations);
	float *temp;

	dim3 numberOfBlocks_Propogate(1,1,1);
	dim3 threadsPerBlock_Propogate(nx,ny,1);

	dim3 numberOfBlocks_Collide(1,1,1);
	dim3 threadsPerBlock_Collide(nx,ny,1);

	for (int i=0; i<iterations; ++i) {
		LBM_D2Q9_Kernel_Propogate<<<numberOfBlocks_Propogate, threadsPerBlock_Propogate>>>(F_d, BOUNCE_d);

		temp = BOUNCE_d;
		BOUNCE_d = F_d;
		F_d = temp;

		LBM_D2Q9_Kernel_Collide<<<numberOfBlocks_Collide, threadsPerBlock_Collide>>>(F_d, OBSTACLES_d, BOUNCE_d);
//		LBM_D2Q9_Kernel_Collide<<<numberOfBlocks_Collide, threadsPerBlock_Collide>>>(BOUNCE_d, OBSTACLES_d, F_d);
	}
}
 

FAQ: Why Does My LBM D2Q9 Implementation in CUDA C Not Conserve Density?

1. What is LBM D2Q9 BGK in Cuda C?

LBM D2Q9 BGK in Cuda C is a computational fluid dynamics (CFD) method used to simulate fluid flow and analyze its behavior. LBM stands for lattice Boltzmann method, D2Q9 refers to a specific lattice structure, BGK stands for Bhatnagar-Gross-Krook, and Cuda C is a programming language used for parallel computing.

2. What are the benefits of using LBM D2Q9 BGK in Cuda C?

Using LBM D2Q9 BGK in Cuda C allows for faster and more efficient simulations of fluid flow compared to traditional methods. It also allows for complex geometries and boundary conditions to be easily implemented, and it can take advantage of the parallel processing capabilities of modern graphics processing units (GPUs).

3. How does LBM D2Q9 BGK in Cuda C work?

LBM D2Q9 BGK in Cuda C works by dividing the fluid domain into a lattice structure and simulating the movement of particles within each cell of the lattice. The particles collide and interact with each other according to the BGK collision model, and the resulting macroscopic properties such as density and velocity are calculated. This process is repeated for each time step until the desired simulation time is reached.

4. What are the limitations of using LBM D2Q9 BGK in Cuda C?

While LBM D2Q9 BGK in Cuda C has many advantages, it also has some limitations. One limitation is the accuracy of the simulation, as it may not be suitable for highly viscous or turbulent flows. Another limitation is the complexity of implementing boundary conditions, which can be challenging for certain geometries.

5. What are some real-world applications of LBM D2Q9 BGK in Cuda C?

LBM D2Q9 BGK in Cuda C has a wide range of applications in various industries, including aerospace, automotive, and biomedical engineering. It can be used to simulate fluid flow in aerodynamics, vehicle design, and blood flow in medical devices. It can also be used in environmental studies to model air and water flow, and in the energy sector to simulate heat transfer and fluid flow in power plants.

Similar threads

Back
Top