Fram maintenance 27th July

The postponed Fram cooling maintenance will take place on Wednesday 27th starting at 07:00 until Thursday 28th 20:00, but depending on amount of work and diagnostics needed might be shorter or longer.

The maintenance will be conducted to diagnose and mitigate recent cooling issues and related crashes we have seen on Fram. We hope to have a better view of the situation and an action plan for making the whole cluster available for production. Currently we are at 80% capacity and will continue at that level to keep the system stable.

Fram cooling system maintenance 12.07.22

There is a plan to physically investigate the cooling system issues with vendors on the 12th of July 2022. It means that the entire Fram cluster would be switched off and unavailable. It would take at least one full day to stabilize the cooling and take the system back into production. The date might be changed.

Update 11 July 2022 10:40: This maintenance is postponed to at least next week due to transport challenges.

Fram: slurm crashed

Slurm controller on Fram is crashed, we are investigating.

Update 08.07.2022 14:00 : Fram workload manager node (slurm) crashed again and all running jobs are died.

There is hardware issue discovered, we are in contact with vendor and doing the tests. we will keep you updated.

Access to login nodes will be still open until planned Fram downtime.