Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

Input/Output error with error code -5

  1. Dec 16, 2015 #1

    kelvin490

    User Avatar
    Gold Member

    I got a problem running my FORTRAN program in high performance computer cluster. It runs well in my PC but I want to have mass production of data with different initial conditions so I put it in a cluster node with eight cores, simulate eight sets of data.

    The program can run without problem in home directory but since I need extra memory space a scratch hard disk is added and I run the programs in this disk.

    After a while the program stopped and there is an error message:

    PGFIO/stdio: Input/output error
    PGFIO-F-/formatted write/unit=6/error code returned by host stdio - 5.
    File name = stdout formatted, sequential access record = 181
    In source file TipNew8.f90, at line number 2018
    FORTRAN STOP

    I have run it several times, similar error occurs but the error occurs at different lines. Also it stopped at different time steps each time I run it. This kind of error seems quite random since it occurs at different steps and different lines. Every time it occurs at lines with "write" or "print" function. It runs without problem when I run it in my PC using Microsoft Visual Studio with PGI compiler.

    Does anyone have ideas what's wrong with the program?
     
  2. jcsd
  3. Dec 16, 2015 #2

    jedishrfu

    Staff: Mentor

    Is your program writing to a disk file? Do you have enough disk space on this scratch disk?

    You could check with the "df -h" command if this is linux.
     
  4. Dec 16, 2015 #3

    kelvin490

    User Avatar
    Gold Member

    I have checked and there is enough space.
     
  5. Dec 17, 2015 #4

    DrClaude

    User Avatar

    Staff: Mentor

    I suggest you contact the support staff responsible for the cluster. I don't think there is much we can do to help you without access to the system.
     
  6. Dec 17, 2015 #5
    I don't exactly know what "cluster" means, but here are some ideas, maybe...
    • Are you compiling your program in your PC and then running it in the cluster?
    • Can you run your program in the cluster without taking advantage of the cluster aspect of it? like just one instance of it? does it run this way?
    • Is there such a thing as compiling your program in the cluster? for assured compatiblity?
    • What does cluster mean? Many independent instances of the same program? Are they all writing to the exact same file? or are the file names different?
     
  7. Dec 18, 2015 #6
    Space shouldn't matter if your program has checked before doing the heavy processing, in which case that is one termination mode.
    Are all channel resources made to be sure to be allocated before the run.
    Program error handling ....

    Sounds though something similar is happening, such as a buffer overflow somewhere, or a node conflict and timeout to disk access.

    Is that your software or from the cluster I don't know enough about it. Is it from the network links - is that a possibility.

    Random means that the error is indeterminate - ie works really well until the error occurs and you have complete collapse, such as adding the scratch disk has led to an overwhelming accumulation of data.

    that;s about all I know.
     
  8. Dec 18, 2015 #7

    DrClaude

    User Avatar

    Staff: Mentor

    Now that you mention it, this is what I would investigate first. You should be careful that different nodes are not trying to write to a file at the same time. It is very good practice to have one node handle all input/output.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: Input/Output error with error code -5
  1. Error in code (Replies: 4)

  2. Error in my code? (Replies: 5)

Loading...