# [Fortran] making a more efficient bilinear interpolation

1. Nov 13, 2013

### octopode

1. The problem statement, all variables and given/known data

I'm trying to write an efficient bilinear (2D)-interpolation, after reading some recipes, as a fortran-mex for Matlab that is used extensively throughout a long algorithm of solar image processing, and therefore is one of my main bottlenecks. I'm not a Fortran expert and this is one of my self-made exercise to understand and enforce efficient programming with Matlab. I managed to write a working code, partially vectorized, with one DO loop, but my poor skills makes me sure that there must be a faster and more elegant way of doing it. I was hoping, if faster, that one could avoid a DO LOOP to iterate over each given points where i need the interpolation.
Any advice that can speed up the code is most welcome.
I'm using gcc version 4.3.6.
I cannot use a later one as it the only one fully compatible with my version of Matlab. Any later version prevented the use of some embedded matlab-mex functions necessary for interfacing / displaying error messages in the matlab prompts.
My mex-setup can compile Fortran 90 and 95 (so far...)

I already have a C++ mex equivalent used so far and was hoping that a well-written Fortran code would be faster.

I'm new to this forum, I understood from the sticky notes in this forum that this was the right place. I apologize if I misread the notes and please re-direct me to any better place for this request.

2. Relevant equations

The equations for bilinear interpolation that i'm using are :

Code (Text):
f1 = wx1*Image(y1,x1) + wx2*Image(y1,x2)
f2 = wx1*Image(y2,x1) + wx2*Image(y2,x2)
IntIm= wy1*f1 + wy2*f2

x1 and x2 are the x-1 and x+1, same for y, x and y being the point at which the interpolation is needed. These coordinates are input as vectors, i.e, vector of x, 1D, and vector of y, 1D (see below for my attempt at implementing it)

wx* and wy* are the weights, which goes to NaN whenever x1 = x2 and y1 = y2

3. The attempt at a solution

Code (Text):
SUBROUTINE L2DINTERPOL3(IntIm,Image,x,y,NPts,M,N)

INTEGER*8             :: NPts,M,N,jj
INTEGER*8             :: x1(NPts),y1(NPts),x2(NPts),y2(NPts)
DOUBLE PRECISION :: IntIm(NPts)
DOUBLE PRECISION :: Image(M,N)
DOUBLE PRECISION :: xjj,yjj,f1,f2
DOUBLE PRECISION :: x(NPts),y(NPts)
DOUBLE PRECISION :: wx1(NPts),wx2(NPts),wy1(NPts),wy2(NPts)

c Take the nearest integers around the input.

x1 = FLOOR(x)
x2 = CEILING(x)
y1 = FLOOR(y)
y2 = CEILING(y)

wx1 = (x2-x)/(x2-x1)
wx2 = (x-x1)/(x2-x1)
wy1 = (y2-y)/(y2-y1)
wy2 = (y-y1)/(y2-y1)

c Whenever the input are already integers, weights go to infinity,
c so set each pair to 1 and 0.

WHERE(x1 .EQ. x2)
wx1 = 1.
wx2 = 0.
END WHERE

WHERE (y1 .EQ. y2)
wy1 = 1.
wy2 = 0.
END WHERE

WHERE((x1 .EQ. x2) .AND. (y1 .EQ. y2))
wx1 = 1.
wx2 = 0.

wy1 = 1.
wy2 = 0.
END WHERE

c Main calculation, loop over each element of the input.

DO 10 jj=1,NPts

f1 = wx1(jj)*Image(y1(jj),x1(jj)) + wx2(jj)*Image(y1(jj),x2(jj))
f2 = wx1(jj)*Image(y2(jj),x1(jj)) + wx2(jj)*Image(y2(jj),x2(jj))

IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2

10   CONTINUE

RETURN
END
1. The problem statement, all variables and given/known data

2. Relevant equations

3. The attempt at a solution

2. Nov 13, 2013

### Staff: Mentor

Where your code is spending most of its time is in your loop. To speed that up you could "unroll" the loop so that it processes more than one point per iteration. You might think about how you could process points in pairs in your loop, or in groups of four or five or even ten.

BTW, you're using a really old style of DO loop. I would think that you could use this style, which is a bit more modern:
Code (Text):
DO jj = 1, NPts
<body of loop>
END

3. Nov 14, 2013

### octopode

Ok, Thank you for this 1st reply. Are you suggesting to use a matrix-form of this calculation to "pack" a few points that are to be interpolated ? Or are you suggesting an explicit parallelization where each core get a set in parallel ? It's something i already do in Matlab, never tried it with Fortran yet.

4. Nov 14, 2013

### Staff: Mentor

Closer to the first. Instead of processing a single point in each loop iteration, process two or four or ... of them. Here's how it might look if you process two points per loop iteration.

Code (Text):
DO jj = 1, NPts - 1, 2
f1 = wx1(jj)*Image(y1(jj),x1(jj)) + wx2(jj)*Image(y1(jj),x2(jj))
f2 = wx1(jj)*Image(y2(jj),x1(jj)) + wx2(jj)*Image(y2(jj),x2(jj))
f3 = wx1(jj + 1)*Image(y1(jj+1),x1(jj+1)) + wx2(jj+1)*Image(y1(jj+1),x2(jj+1))
f4 = wx1(jj + 1)*Image(y2(jj + 1),x1(jj + 1)) + wx2(jj + 1)*Image(y2(jj + 1),x2(jj + 1))

IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2
IntIm(jj + 1) = wy1(jj + 1)*f3 + wy2(jj + 1)*f4
END

The same idea can be extended to process more points in each loop iteration. You would need to take more care than I have to ensure that the loop hits all of the points. This loop header, DO jj = 1, NPts, sets jj to 1, 2, 3, ..., NPts.

The revised loop header that I show, DO jj = 1, NPts - 1, 2, sets jj to 1, 3, 5, ... and so on, so if NPts happens to be an odd integer, you have a slight problem. If NPts is even, then there's no problem.

5. Nov 18, 2013

### octopode

Ok, i'm trying. In the meantime, i tried your style of DO LOOP. It would not compile with the END instead of the CONTINUE. It expects "END DO".

6. Nov 18, 2013

### octopode

After unrolling, and dealing with the odd even (just an IF statement upstream from the DO loops), i did just what you suggested, and this did not affect computing time at all. Actually it took 4s longer, which is about by how much time the code varies from one run to the next.

Any other idea to speed that up ? Should i unroll the loop even more ?

7. Nov 18, 2013

### Staff: Mentor

That's what I meant. It's been about 15 years since I wrote any Fortran code.

I would. How many points are you processing in each loop iteration? The example I showed did only two. The more points you process per loop iteration, the faster your code should run.

Last edited: Nov 19, 2013
8. Nov 19, 2013

### octopode

15 years ?? You're making me feel like a dinosaure by using Fortran now !! :)

The number of points to process is of the order of 10^5 points.

Last edited: Nov 19, 2013
9. Nov 19, 2013

### octopode

Another intriguing fact is that i made another version, with more checks in the loops instead of the WHERE construct at the beginning (see below). And surprisingly, with a higher load in the DO LOOP, this version of the code is 50% faster than the one in my 1st post (30s instead of 60s) !!! So, the WHERE construct is highly inefficient !!! How come ? Is the WHERE construct only a way to simplify writing at the expense of computing time ? This is surprising, i didn't expect this behavior in Fortran.
What the WHERE did was simply putting 1 and 0 whenever it encountered equal values in the compared arrays.

This also tells me that most of the time was not spent in the loop, but it was equally shared with this WHERE construct. I don't know how WHERE work but wouldn't it be a hidden DO LOOP ? Making the code enter 2 DO LOOPS with just as many points to process ?

Code (Text):

x1 = FLOOR(x)
x2 = CEILING(x)
y1 = FLOOR(y)
y2 = CEILING(y)

wx1 = (x2-x)/(x2-x1)
wx2 = (x-x1)/(x2-x1)
wy1 = (y2-y)/(y2-y1)
wy2 = (y-y1)/(y2-y1)

DO 10 jj=1,NPts

IF ((x1(jj) .EQ. x2(jj)) .OR. (y1(jj) .EQ. y2(jj))) THEN

IF ((x1(jj) .EQ. x2(jj)) .AND. (y1(jj) .NE. y2(jj))) THEN

IntIm(jj) = wy1(jj)*Image(y1(jj),x1(jj)) +
$wy2(jj)*Image(y2(jj),x1(jj)) ELSE IF ((x1(jj) .NE. x2(jj)) .AND. (y1(jj) .EQ. y2(jj))) THEN IntIm(jj) = wx1(jj)*Image(y1(jj),x1(jj)) +$               wx2(jj)*Image(y1(jj),x2(jj))

ELSE

IntIm(jj) = Image(y1(jj),x1(jj))

END IF

ELSE

f1 = wx1(jj)*Image(y1jj,x1jj) + wx2(jj)*Image(y1jj,x2jj)
f2 = wx1(jj)*Image(y2jj,x1jj) + wx2(jj)*Image(y2jj,x2jj)

IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2

END IF

10   CONTINUE
RETURN
END

10. Nov 19, 2013

### Staff: Mentor

Fortran has been around for a long time. The first class I took in Fortran was in 1980.

11. Nov 19, 2013

### Staff: Mentor

As far as I know, WHERE is a relatively recent development in Fortran. Since a WHERE block has to process multiple points, it wouldn't surprise me to learn that the compiler generates a DO loop. The thought did occur to me, though, that maybe your code was spending time in the WHERE block, but I didn't pursue that avenue.

There's a tool that programmers use when they need to optimize their code - a profiler. This tool keeps track of how many times each part of the code executes. The lines of code where a program spends its time are the places where improvements can be had.

12. Nov 19, 2013

### octopode

i'm actually profiling the code from Matlab, the problem is it cannot profile the code within the Fortran file itself. So, all i get is the time spent to execute the whole function. I would have to use the function within a fortran exec, outside Matalb. This may not be worth it for the time being. This code is microscopic, it's easy to test which block is the more time consuming.