[Fortran] making a more efficient bilinear interpolation

In summary, the author is looking for advice on how to write an efficient bilinear interpolation for 2D images in Fortran. He is new to this forum and was hoping that someone more experienced could help him. He has already written a C++ mex equivalent, but is looking for a faster way to do it.
  • #1
octopode
14
0

Homework Statement



I'm trying to write an efficient bilinear (2D)-interpolation, after reading some recipes, as a fortran-mex for Matlab that is used extensively throughout a long algorithm of solar image processing, and therefore is one of my main bottlenecks. I'm not a Fortran expert and this is one of my self-made exercise to understand and enforce efficient programming with Matlab. I managed to write a working code, partially vectorized, with one DO loop, but my poor skills makes me sure that there must be a faster and more elegant way of doing it. I was hoping, if faster, that one could avoid a DO LOOP to iterate over each given points where i need the interpolation.
Any advice that can speed up the code is most welcome.
I'm using gcc version 4.3.6.
I cannot use a later one as it the only one fully compatible with my version of Matlab. Any later version prevented the use of some embedded matlab-mex functions necessary for interfacing / displaying error messages in the MATLAB prompts.
My mex-setup can compile Fortran 90 and 95 (so far...)

I already have a C++ mex equivalent used so far and was hoping that a well-written Fortran code would be faster.

I'm new to this forum, I understood from the sticky notes in this forum that this was the right place. I apologize if I misread the notes and please re-direct me to any better place for this request.

Homework Equations



The equations for bilinear interpolation that I'm using are :

Code:
f1 = wx1*Image(y1,x1) + wx2*Image(y1,x2)
f2 = wx1*Image(y2,x1) + wx2*Image(y2,x2)
IntIm= wy1*f1 + wy2*f2

x1 and x2 are the x-1 and x+1, same for y, x and y being the point at which the interpolation is needed. These coordinates are input as vectors, i.e, vector of x, 1D, and vector of y, 1D (see below for my attempt at implementing it)

wx* and wy* are the weights, which goes to NaN whenever x1 = x2 and y1 = y2


The Attempt at a Solution



Code:
SUBROUTINE L2DINTERPOL3(IntIm,Image,x,y,NPts,M,N)

      INTEGER*8             :: NPts,M,N,jj
      INTEGER*8             :: x1(NPts),y1(NPts),x2(NPts),y2(NPts)
      DOUBLE PRECISION :: IntIm(NPts)
      DOUBLE PRECISION :: Image(M,N)
      DOUBLE PRECISION :: xjj,yjj,f1,f2
      DOUBLE PRECISION :: x(NPts),y(NPts)
      DOUBLE PRECISION :: wx1(NPts),wx2(NPts),wy1(NPts),wy2(NPts)

c Take the nearest integers around the input. 

       x1 = FLOOR(x)
       x2 = CEILING(x)
       y1 = FLOOR(y)
       y2 = CEILING(y)

       wx1 = (x2-x)/(x2-x1)
       wx2 = (x-x1)/(x2-x1)
       wy1 = (y2-y)/(y2-y1)
       wy2 = (y-y1)/(y2-y1)

c Whenever the input are already integers, weights go to infinity,
c so set each pair to 1 and 0.
 
       WHERE(x1 .EQ. x2)
          wx1 = 1.
          wx2 = 0.
       END WHERE
       
       WHERE (y1 .EQ. y2)
          wy1 = 1.
          wy2 = 0.
       END WHERE

        WHERE((x1 .EQ. x2) .AND. (y1 .EQ. y2))
          wx1 = 1.
          wx2 = 0.

          wy1 = 1.
          wy2 = 0.
       END WHERE
   
c Main calculation, loop over each element of the input. 

      DO 10 jj=1,NPts
         
       
         f1 = wx1(jj)*Image(y1(jj),x1(jj)) + wx2(jj)*Image(y1(jj),x2(jj))
         f2 = wx1(jj)*Image(y2(jj),x1(jj)) + wx2(jj)*Image(y2(jj),x2(jj))

         IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2


 10   CONTINUE

      RETURN
      END
 
Physics news on Phys.org
  • #2
octopode said:

Homework Statement



I'm trying to write an efficient bilinear (2D)-interpolation, after reading some recipes, as a fortran-mex for Matlab that is used extensively throughout a long algorithm of solar image processing, and therefore is one of my main bottlenecks. I'm not a Fortran expert and this is one of my self-made exercise to understand and enforce efficient programming with Matlab. I managed to write a working code, partially vectorized, with one DO loop, but my poor skills makes me sure that there must be a faster and more elegant way of doing it. I was hoping, if faster, that one could avoid a DO LOOP to iterate over each given points where i need the interpolation.
Any advice that can speed up the code is most welcome.
I'm using gcc version 4.3.6.
I cannot use a later one as it the only one fully compatible with my version of Matlab. Any later version prevented the use of some embedded matlab-mex functions necessary for interfacing / displaying error messages in the MATLAB prompts.
My mex-setup can compile Fortran 90 and 95 (so far...)

I already have a C++ mex equivalent used so far and was hoping that a well-written Fortran code would be faster.

I'm new to this forum, I understood from the sticky notes in this forum that this was the right place. I apologize if I misread the notes and please re-direct me to any better place for this request.

Homework Equations



The equations for bilinear interpolation that I'm using are :

Code:
f1 = wx1*Image(y1,x1) + wx2*Image(y1,x2)
f2 = wx1*Image(y2,x1) + wx2*Image(y2,x2)
IntIm= wy1*f1 + wy2*f2

x1 and x2 are the x-1 and x+1, same for y, x and y being the point at which the interpolation is needed. These coordinates are input as vectors, i.e, vector of x, 1D, and vector of y, 1D (see below for my attempt at implementing it)

wx* and wy* are the weights, which goes to NaN whenever x1 = x2 and y1 = y2


The Attempt at a Solution



Code:
SUBROUTINE L2DINTERPOL3(IntIm,Image,x,y,NPts,M,N)

      INTEGER*8             :: NPts,M,N,jj
      INTEGER*8             :: x1(NPts),y1(NPts),x2(NPts),y2(NPts)
      DOUBLE PRECISION :: IntIm(NPts)
      DOUBLE PRECISION :: Image(M,N)
      DOUBLE PRECISION :: xjj,yjj,f1,f2
      DOUBLE PRECISION :: x(NPts),y(NPts)
      DOUBLE PRECISION :: wx1(NPts),wx2(NPts),wy1(NPts),wy2(NPts)

c Take the nearest integers around the input. 

       x1 = FLOOR(x)
       x2 = CEILING(x)
       y1 = FLOOR(y)
       y2 = CEILING(y)

       wx1 = (x2-x)/(x2-x1)
       wx2 = (x-x1)/(x2-x1)
       wy1 = (y2-y)/(y2-y1)
       wy2 = (y-y1)/(y2-y1)

c Whenever the input are already integers, weights go to infinity,
c so set each pair to 1 and 0.
 
       WHERE(x1 .EQ. x2)
          wx1 = 1.
          wx2 = 0.
       END WHERE
       
       WHERE (y1 .EQ. y2)
          wy1 = 1.
          wy2 = 0.
       END WHERE

        WHERE((x1 .EQ. x2) .AND. (y1 .EQ. y2))
          wx1 = 1.
          wx2 = 0.

          wy1 = 1.
          wy2 = 0.
       END WHERE
   
c Main calculation, loop over each element of the input. 

      DO 10 jj=1,NPts
         
       
         f1 = wx1(jj)*Image(y1(jj),x1(jj)) + wx2(jj)*Image(y1(jj),x2(jj))
         f2 = wx1(jj)*Image(y2(jj),x1(jj)) + wx2(jj)*Image(y2(jj),x2(jj))

         IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2


 10   CONTINUE

      RETURN
      END
Where your code is spending most of its time is in your loop. To speed that up you could "unroll" the loop so that it processes more than one point per iteration. You might think about how you could process points in pairs in your loop, or in groups of four or five or even ten.

BTW, you're using a really old style of DO loop. I would think that you could use this style, which is a bit more modern:
Code:
DO jj = 1, NPts
   <body of loop>
END
 
  • #3
Ok, Thank you for this 1st reply. Are you suggesting to use a matrix-form of this calculation to "pack" a few points that are to be interpolated ? Or are you suggesting an explicit parallelization where each core get a set in parallel ? It's something i already do in Matlab, never tried it with Fortran yet.
 
  • #4
Closer to the first. Instead of processing a single point in each loop iteration, process two or four or ... of them. Here's how it might look if you process two points per loop iteration.

Code:
DO jj = 1, NPts - 1, 2      
         f1 = wx1(jj)*Image(y1(jj),x1(jj)) + wx2(jj)*Image(y1(jj),x2(jj))
         f2 = wx1(jj)*Image(y2(jj),x1(jj)) + wx2(jj)*Image(y2(jj),x2(jj))
         f3 = wx1(jj + 1)*Image(y1(jj+1),x1(jj+1)) + wx2(jj+1)*Image(y1(jj+1),x2(jj+1))
         f4 = wx1(jj + 1)*Image(y2(jj + 1),x1(jj + 1)) + wx2(jj + 1)*Image(y2(jj + 1),x2(jj + 1))

         IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2
         IntIm(jj + 1) = wy1(jj + 1)*f3 + wy2(jj + 1)*f4
END

The same idea can be extended to process more points in each loop iteration. You would need to take more care than I have to ensure that the loop hits all of the points. This loop header, DO jj = 1, NPts, sets jj to 1, 2, 3, ..., NPts.

The revised loop header that I show, DO jj = 1, NPts - 1, 2, sets jj to 1, 3, 5, ... and so on, so if NPts happens to be an odd integer, you have a slight problem. If NPts is even, then there's no problem.
 
  • #5
Ok, I'm trying. In the meantime, i tried your style of DO LOOP. It would not compile with the END instead of the CONTINUE. It expects "END DO".
 
  • #6
After unrolling, and dealing with the odd even (just an IF statement upstream from the DO loops), i did just what you suggested, and this did not affect computing time at all. Actually it took 4s longer, which is about by how much time the code varies from one run to the next.

Any other idea to speed that up ? Should i unroll the loop even more ?
 
  • #7
octopode said:
Ok, I'm trying. In the meantime, i tried your style of DO LOOP. It would not compile with the END instead of the CONTINUE. It expects "END DO".
That's what I meant. It's been about 15 years since I wrote any Fortran code.

octopode said:
After unrolling, and dealing with the odd even (just an IF statement upstream from the DO loops), i did just what you suggested, and this did not affect computing time at all. Actually it took 4s longer, which is about by how much time the code varies from one run to the next.

Any other idea to speed that up ? Should i unroll the loop even more ?
I would. How many points are you processing in each loop iteration? The example I showed did only two. The more points you process per loop iteration, the faster your code should run.
 
Last edited:
  • #8
15 years ?? You're making me feel like a dinosaure by using Fortran now ! :)

The number of points to process is of the order of 10^5 points.
 
Last edited:
  • #9
Another intriguing fact is that i made another version, with more checks in the loops instead of the WHERE construct at the beginning (see below). And surprisingly, with a higher load in the DO LOOP, this version of the code is 50% faster than the one in my 1st post (30s instead of 60s) ! So, the WHERE construct is highly inefficient ! How come ? Is the WHERE construct only a way to simplify writing at the expense of computing time ? This is surprising, i didn't expect this behavior in Fortran.
What the WHERE did was simply putting 1 and 0 whenever it encountered equal values in the compared arrays.

This also tells me that most of the time was not spent in the loop, but it was equally shared with this WHERE construct. I don't know how WHERE work but wouldn't it be a hidden DO LOOP ? Making the code enter 2 DO LOOPS with just as many points to process ?

Code:
       x1 = FLOOR(x)
       x2 = CEILING(x)
       y1 = FLOOR(y)
       y2 = CEILING(y)

       wx1 = (x2-x)/(x2-x1)
       wx2 = (x-x1)/(x2-x1)
       wy1 = (y2-y)/(y2-y1)
       wy2 = (y-y1)/(y2-y1)

      DO 10 jj=1,NPts
      
         IF ((x1(jj) .EQ. x2(jj)) .OR. (y1(jj) .EQ. y2(jj))) THEN
        
          IF ((x1(jj) .EQ. x2(jj)) .AND. (y1(jj) .NE. y2(jj))) THEN
             
         IntIm(jj) = wy1(jj)*Image(y1(jj),x1(jj)) +
     $               wy2(jj)*Image(y2(jj),x1(jj)) 

          ELSE IF ((x1(jj) .NE. x2(jj)) .AND. (y1(jj) .EQ. y2(jj))) THEN

         IntIm(jj) = wx1(jj)*Image(y1(jj),x1(jj)) +
     $               wx2(jj)*Image(y1(jj),x2(jj))

          ELSE
             
         IntIm(jj) = Image(y1(jj),x1(jj))

             END IF
 
         ELSE
        
         f1 = wx1(jj)*Image(y1jj,x1jj) + wx2(jj)*Image(y1jj,x2jj)
         f2 = wx1(jj)*Image(y2jj,x1jj) + wx2(jj)*Image(y2jj,x2jj)

         IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2

         END IF

 10   CONTINUE
      RETURN
      END
 
  • #10
octopode said:
15 years ?? You're making me feel like a dinosaure by using Fortran now ! :)
Fortran has been around for a long time. The first class I took in Fortran was in 1980.
octopode said:
The number of points to process is of the order of 10^5 points.
 
  • #11
octopode said:
Another intriguing fact is that i made another version, with more checks in the loops instead of the WHERE construct at the beginning (see below). And surprisingly, with a higher load in the DO LOOP, this version of the code is 50% faster than the one in my 1st post (30s instead of 60s) ! So, the WHERE construct is highly inefficient ! How come ? Is the WHERE construct only a way to simplify writing at the expense of computing time ? This is surprising, i didn't expect this behavior in Fortran.
What the WHERE did was simply putting 1 and 0 whenever it encountered equal values in the compared arrays.

This also tells me that most of the time was not spent in the loop, but it was equally shared with this WHERE construct. I don't know how WHERE work but wouldn't it be a hidden DO LOOP ? Making the code enter 2 DO LOOPS with just as many points to process ?
As far as I know, WHERE is a relatively recent development in Fortran. Since a WHERE block has to process multiple points, it wouldn't surprise me to learn that the compiler generates a DO loop. The thought did occur to me, though, that maybe your code was spending time in the WHERE block, but I didn't pursue that avenue.

There's a tool that programmers use when they need to optimize their code - a profiler. This tool keeps track of how many times each part of the code executes. The lines of code where a program spends its time are the places where improvements can be had.

octopode said:
Code:
       x1 = FLOOR(x)
       x2 = CEILING(x)
       y1 = FLOOR(y)
       y2 = CEILING(y)

       wx1 = (x2-x)/(x2-x1)
       wx2 = (x-x1)/(x2-x1)
       wy1 = (y2-y)/(y2-y1)
       wy2 = (y-y1)/(y2-y1)

      DO 10 jj=1,NPts
      
         IF ((x1(jj) .EQ. x2(jj)) .OR. (y1(jj) .EQ. y2(jj))) THEN
        
          IF ((x1(jj) .EQ. x2(jj)) .AND. (y1(jj) .NE. y2(jj))) THEN
             
         IntIm(jj) = wy1(jj)*Image(y1(jj),x1(jj)) +
     $               wy2(jj)*Image(y2(jj),x1(jj)) 

          ELSE IF ((x1(jj) .NE. x2(jj)) .AND. (y1(jj) .EQ. y2(jj))) THEN

         IntIm(jj) = wx1(jj)*Image(y1(jj),x1(jj)) +
     $               wx2(jj)*Image(y1(jj),x2(jj))

          ELSE
             
         IntIm(jj) = Image(y1(jj),x1(jj))

             END IF
 
         ELSE
        
         f1 = wx1(jj)*Image(y1jj,x1jj) + wx2(jj)*Image(y1jj,x2jj)
         f2 = wx1(jj)*Image(y2jj,x1jj) + wx2(jj)*Image(y2jj,x2jj)

         IntIm(jj) = wy1(jj)*f1 + wy2(jj)*f2

         END IF

 10   CONTINUE
      RETURN
      END
 
  • #12
i'm actually profiling the code from Matlab, the problem is it cannot profile the code within the Fortran file itself. So, all i get is the time spent to execute the whole function. I would have to use the function within a fortran exec, outside Matalb. This may not be worth it for the time being. This code is microscopic, it's easy to test which block is the more time consuming.
 

1. What is bilinear interpolation?

Bilinear interpolation is a method used in computer graphics and image processing to calculate the value of a point that falls between two given data points. It involves using a weighted average of the four nearest data points to estimate the value at the desired point.

2. Why is it important to make bilinear interpolation more efficient?

Efficient bilinear interpolation can significantly improve the performance of computer graphics and image processing applications, as these techniques often involve a large number of calculations. Making it more efficient can also save time and resources, making it a desirable goal for researchers and developers.

3. How can we make bilinear interpolation more efficient?

There are several approaches to improving the efficiency of bilinear interpolation, including optimizing the algorithm used, parallelizing the computations, and using specialized hardware or software libraries. Each of these methods may have advantages and limitations, and the most suitable approach will depend on the specific application and resources available.

4. What are the benefits of efficient bilinear interpolation?

In addition to improving the performance of computer graphics and image processing applications, efficient bilinear interpolation can also lead to higher quality results, as it minimizes errors and artifacts that may occur with less efficient methods. It can also make it possible to process larger datasets or images in real-time, enabling more advanced and complex applications.

5. Are there any challenges in making bilinear interpolation more efficient?

While there are many potential benefits to improving the efficiency of bilinear interpolation, there are also challenges that must be addressed. These may include balancing the trade-off between speed and accuracy, managing memory usage, and ensuring compatibility with different hardware and software environments. Additionally, the complexity and variability of different applications may require different approaches to optimize performance.

Similar threads

Replies
10
Views
265
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
10
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
1
Views
116
  • Engineering and Comp Sci Homework Help
Replies
18
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
5
Views
1K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
2K
  • MATLAB, Maple, Mathematica, LaTeX
Replies
2
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
3K
  • Programming and Computer Science
Replies
8
Views
3K
Back
Top