Fram maintenance 27th July

The postponed Fram cooling maintenance will take place on Wednesday 27th starting at 07:00 until Thursday 28th 20:00, but depending on amount of work and diagnostics needed might be shorter or longer.

The maintenance will be conducted to diagnose and mitigate recent cooling issues and related crashes we have seen on Fram. We hope to have a better view of the situation and an action plan for making the whole cluster available for production. Currently we are at 80% capacity and will continue at that level to keep the system stable.

Fram availability at 50%

Dear Fram user. We are in the process of diagnosing and fixing several issues with Fram supercomputer. We have identified at least two hardware related issues to cooling equipment and a faulty CPU in a non-redundant server (queueing server). We have mitigated the issues temporarily and we try to slowly bring the system back in a stable configuration.

Fram will run in 50% capacity until Tuesday 10:00 and then with ca 75 % capacity until Wednesday 10:00 when we will attempt to run at 100%.

A maintenance window is still needed to fix both the faulty CPU and the issues with cooling capacity. We will update these pages when we know more about the maintenance.

UPDATE: Fram availability is now at 80%

[DONE] Betzy downtime 13th June – 15th June

[Update, 2022-06-14] The maintenance stop was finished late last night.

**UPDATE** Downtime has started

Downtime starts 13th June 08:00 and last until 15th June 16:00.

A fault has been discovered in one of the switches in the main power board for Betzy compute nodes. Downtime is required to swap out the switch. We will take the opportunity to do further hardware and software maintenance and also implement and test an “emergency shutdown/reset” procedure for the whole of Betzy.

No services requiring access to any part of the system, including login nodes and storage services (NFS exported directories or backup directories), will be available during downtime, but some parts of the system (mainly storage and login) may have shorter downtime than other parts.

Fram downtime 23rd – 24th February

[Update, 2022-02-24 22:30]: The maintenance is over and Fram is in production again. Thank you for your patience!

[Update, 2022-02-24 20:30]: The maintenance is taking a little longer than planned. We plan to get back into production at 22:00.

[Update, 2022-02-23 12:00]: The maintenance stop has now begun.

Fram supercomputer will be unavailable due to maintenance on the cooling system from February 23rd 12:00 until 24th 20:00

If time allows it we will also upgrade whole or parts/components of the storage system, including file system clients (compute nodes)