Reduced capacity on Fram, 2nd February 2023 due to maintenance on cooling units

Dear Fram user. We need to push the maintenance on the FRAM cooling system until the second of February due to some unexpected internal complications from the service vendor. We apologize for the inconvenience this may cause you.

While the maintenance is ongoing there will be reduced capacity on available compute nodes.

NIRD-TRD storage experiencing trouble

Parts of the file systems of NIRD-TRD are unavailable. We are working with vendor for a solution. It might be necessary to put the whole system offline on short notice.

Update 2022-10-17, 14:47: Unfortunately we have to turn off most of the NIRD Trondheim cluster at this time. The system might be unavailable until tomorrow, but we will keep you informed about the availability.

This affects the following services:
– Data storage for projects with primary site in Trondheim
– Services running on the service platform in Trondheim
– NIRD mounts on Saga and Betzy

Update 2022-10-17, 17:30: NIRD-TRD and the attached services are back into production.

Fram maintenance 27th July

The postponed Fram cooling maintenance will take place on Wednesday 27th starting at 07:00 until Thursday 28th 20:00, but depending on amount of work and diagnostics needed might be shorter or longer.

The maintenance will be conducted to diagnose and mitigate recent cooling issues and related crashes we have seen on Fram. We hope to have a better view of the situation and an action plan for making the whole cluster available for production. Currently we are at 80% capacity and will continue at that level to keep the system stable.

Fram availability at 50%

Dear Fram user. We are in the process of diagnosing and fixing several issues with Fram supercomputer. We have identified at least two hardware related issues to cooling equipment and a faulty CPU in a non-redundant server (queueing server). We have mitigated the issues temporarily and we try to slowly bring the system back in a stable configuration.

Fram will run in 50% capacity until Tuesday 10:00 and then with ca 75 % capacity until Wednesday 10:00 when we will attempt to run at 100%.

A maintenance window is still needed to fix both the faulty CPU and the issues with cooling capacity. We will update these pages when we know more about the maintenance.

UPDATE: Fram availability is now at 80%