Dear Fram cluster users:
We have a problem with the cooling system in the Fram machine room,
due to this, we have to reduce the load on the cluster by reserving the entire cluster, which means no job will run.
We are sorry for the inconvenience, and we will keep you updated.
Update 2019.07.11 08:00: Fram should be fully operational again, we are monitoring the machine and releasing compute nodes back to production.
Update 14:55: Some of the nodes are crashed, which means it’s possible that some of the jobs get killed
Update 2019.07.03 10:55: To keep the machine room temperature reasonably low with only one working CDU, we have kept 495 nodes in maintenance state while 197 nodes are in downstate, we will monitor the power consumption in the machine room and release more nodes accordingly.
Update 2019.07.08 11:55: Fram is expected to be back to full its full capacity on Wednesday, 2019.07.10.