Queueing system on Vilje has crashed. We are working on a fix
There seems to have been a problem with the Fram /cluster file system between 23:50 on Friday and 00:30 on Saturday. Symptoms were error messages like “No space left on device”. Because of this problem lots for compute nodes are drained. we are investigating the problem.
Vilje queueing system was unavailable from Sunday 5th 15:30 until monday 6th 08:30, due to a faulty infiniband cable.
We apologize for the inconvenience.
NIRD storage system was crashed and unavailable for short period of time.
Due to this crash, users logged in to NIRD and Fram experienced problemes.
The problem is resolved, NIRD storage system is online now.
Please contact us if you still encounter problems.
Note: The export of NIRD to FRAM does not work currently
2019-11-13-16:15 Fram is up and running again.
One of the cooling units stoped, causing the other to also stop and all compute nodes went down.
Dear Fram User,
Fram is currently down likely due to issues with the cooling distribution unit.
We are currently investigating the issue and working on placing Fram back into production.
Apologies for the inconvenience!
Dear Fram cluster users:
login-1-2 will be reinstalled, and will be removed from DNS temporarily. It will be added back to DNS when reinstallation is over.
Update: 15:12 login-1-2 is reinstalled and added back to the DNS configuration.
The node mentioned above has to be rebooted due to its unresponsiveness. We are sorry for any inconvenience.
login-1-1 node hanged and had to be rebooted. Up and running again now. Have a nice weekend!
Fram login-1-2 is rebooted around 15:10 today due to the lustre filesystem glitch.
- 2019-08-26 12:45: NIRD project areas are mounted on Fram login nodes.
- 2019-08-25 14:15:Service Platform is up now, you can login now to NIRD and access your files.
NIRD project areas will be reconnected to Fram tomorrow.
- 2019-08-23 18:42: Vendor started a forced health check on the system which is taking more time then expected. We will re-open access to NIRD and Service Platform as soon as checks and rebuilds are finished.
- 2019-08-23 08:05: Storage vendor has finished the hardware replacements and installation of new firmware on the storage system.
We are currently monitoring the storage system together with the vendor.
Dear NIRD and Service Platform users,
We have a planned downtime on the 22nd of August, to replace some defective hardware. Systems will be taken offline starting from 08:00AM.
Engineer from storage vendor will assist us from the very first hour.
We expect the maintenance to finish in one and a half day.
NIRD projects will still be accessible during the maintenance
from login-trd.nird.sigma2.no but in read-only mode.
Will keep you updated here.
Sorry for the short notice.