Update, 2022-12-07 12:45: The problem with the /cluster storage system has been identified and fixed, and the file system should work as normal again.
Update, 2022-12-07 09:30: Compute nodes and login nodes are up, and Fram is running jobs, but we are experiencing problems with the /cluster storage system. This shows up as occasional hangs (up to minutes) and/or Input/Output errors. It appears to affect all nodes (login or compute).
Update 2022-12-06 11:32: We have found the cause and restored most services. Still looking into some potential file system issues.
[2022-12-06 09:30] Fram is currently down. We are still investigating, but currently it looks like a gateway router has gone down. We will update this post as we know more.
[UPDATE, 2022-11-25 10:00: Maintenance is finished and Betzy will be back in production within 12:00]
[UPDATE, 2022-11-21 09:00: Maintenance has started]
There will be an urgent maintenance stop on Betzy next week, starting Monday 2022-11-21 at 10:00. The stop will last until Thursday 2022-11-24 at 18:00. The maintenance concerns important system upgrades.
We apologize for the short notice on this downtime.
There is a problem with the cooling system on Fram, which leads to many compute nodes automatically shutting down. We are investigating and working on the problem.
Update 19.10.22 – 09:29
Fram is back in production. The problems we experianced yesterday was caused by a small power outage in Tromsø.
Sorry for the inconvenience this may have caused.
– Infrastructure Team
It appears that the cooling on Fram has failed. The result is that many compute nodes are unavailable. We are investigating.
[15-06-2022 – 15:10] – Cooling fixed and computenodes are back in production. Sorry for the inconvenience this has caused.
Most of the compute nodes crashed around 11:00 today due to a cooling problem. Most nodes are up again and running jobs. We are investigating what happened.
Dear LUMI users,
As the system continues to grow, it is necessary to perform a maintenance break for extensive upgrades. The current plan is to start on Monday the 6 June and continue for about 4 weeks.
Unfortunately access to the system won’t be possible during the downtime.
If you need any assistance, please do not hesitate to contact the LUMI User Support Team: https://lumi-supercomputer.eu/user-support/need-help/
Read the full service announcement from LUST (external)
[UPDATE, 2022-05-13 08:00] The maintenance stop is over. There may still be some file system issues. Please report via regular support channels
[UPDATE, 2022-05-11 08:00] The maintenance stop has now started.
The will be a maintenance of the storage system on the Fram supercomputer. The cluster will be unavailable from the 11th of May at 08:00 until the 12th of May at 20:00.
[Update, 2022-04-30 11:10] The Fram and Saga maintenance is now over, and jobs are running again.
[Update, 2022-04-29 08:00] The Fram and Saga maintenances have now started.
[Update, 2022-04-28 12:56] The Betzy maintenance is now over, and jobs are starting again.
[Update, 2022-04-28 08:00] The Betzy maintenance has now started.
There will unfortunately be maintenance stops on all NRIS clusters next week, for an important security update. The maintenance stops will be
- Betzy: Thursday, April 28. at 08:00
- Fram and Saga: Friday, April 29. at 08:00
We expect the stops will last a couple of hours. We have set up maintenance reservations on all nodes on the clusters, so jobs that would have run into the reservation will be left pending in the job queue until after the maintenance stop.
We are sorry for the inconvenience this creates. We had hoped to be able to apply the security update with jobs running, but that turned out not to be possible.
[Update, 2022-02-24 22:30]: The maintenance is over and Fram is in production again. Thank you for your patience!
[Update, 2022-02-24 20:30]: The maintenance is taking a little longer than planned. We plan to get back into production at 22:00.
[Update, 2022-02-23 12:00]: The maintenance stop has now begun.
Fram supercomputer will be unavailable due to maintenance on the cooling system from February 23rd 12:00 until 24th 20:00
If time allows it we will also upgrade whole or parts/components of the storage system, including file system clients (compute nodes)
The login node login-1 on Fram had crashed, and is currently being rebooted. It is hopefully back in a few minutes.