We have identified a bug on the /cluster file system which can lead to random job crashes.
The bug is triggered on the Lustre file system by a combination of running Fortran code compiled with Intel MPI.
A bug report is filed now to the storage vendor.
We will keep you updated!
Update 06-04-2018: We have found and fixed a problem on the file servers and with the tests we ran, we can not reproduce the problem anymore.
Thank you for your consideration!
Metacenter Operations