Fram Filesystem maintenance

Dear Fram users,
We have to do emergency maintenance on Fram storage system, one of the controller has to be rebooted to eliminate errors, during the maintenance /cluster filesystem speed will be degraded. we will update you here.

11:50 Maintenance is over, controller is rebooted. Filesystem performance is back to normal.

Best Regards

Fram off-line: File system issues

Dear Fram Users,

The ongoing problems on FRAM reported July 1st, cause the error message “No space left on device” for various file operations.

The problems are being investigated, and we will keep you updated on the progress.

UPDATE 2020-07-08 14:50: hugemem on Fram is now operating as normal.

UPDATE 2020-07-08 10:35: The file system issues have been resolved and we are operating as normal with the exception of hugemem, which is still unavailable. Please let us know if you’re still experiencing problems. Again we apologize for the inconvenience.

UPDATE 2020-07-08 09:00: Our vendor has corrected the filesystem bug and we should be operating as normal soon. At the moment we’re running some tests which will slow down current jobs running on Fram.

UPDATE 2020-07-07 15:35: The problem on Fram is caused by a bug in the Lustre filesystem. Our vendor is taking over the case to fix the issue. Thank you for your patience, we apologize for the inconvenience.

UPDATE 2020-07-07 09:50 : We are still experiencing file system errors on FRAM, and are working to resolve the issue as soon as possible. Watch this space for updates.

UPDATE 2020-07-06 12:30 : FRAM has been opened again.

UPDATE 2020-07-06 09:50 : The FS is up and running, it seems to be stable and this has also been verified by the vendor. It should be possible to use FRAM within couple of hours.

UPDATE 2020-07-03 17:10 : The FS is up and running but we have decided to keep the machine closed during the weekend so we are sure everything works as it should on Monday. The reason for many recent FRAM downtimes have been caused by storage hardware faults. We are investigating the issue together with the storage vendor.

UPDATE 2020-07-02 13:20 : FRAM is off-line, we are investigating the issues. The machine will probably stay off-line until tomorrow.

UPDATE 2020-07-02 12:10 : Whole file system is still very unstable, we will most likely have to take FRAM down, Slurm reservation created and all users might be kicked out soon.

UPDATE 2020-07-02 11:15 : Whole file system is still very unstable and we are trying to fix the problem.

Metacenter Operations