There will be a maintenance break on Saga to apply patches to the files system. Jobs will not run on the 30.5.22 from 8:00 until some time after lunch. Jobs can be submitted and if short enough will run before the break. Otherwise the will automatically start once the maintenance break is finished.
[UPDATE, 2022-05-11:00]: Yesterday’s loss of power was due to a major power outage in the city of Trondheim.
[UPDATE, 2022-05-12 10:25] Most nodes are now up and running as normal.
[UPDATE, 2022-05-12 08:50]: There was a power outage on Betzy at around 23:30 last night, which made all compute nodes go down. We are working on getting the nodes up and back into production now.
It appears that most or all of Betzy is down right now. We are investigating.
Some users are experiencing slow filesystem issues on fram.
We are currently trying to resolve the issue, but it is a periodical problem that makes debugging difficult.
Sorry for the inconvenience this is causing.
Dear Fram users. Unfortunately, there has been a short power outage in Tromsø causing a shutdown of compute nodes on Fram. We are working on bringing them back to production as soon as possible.
Sorry for the inconvenience this has caused.
[2022-05-03 – 10:45] – Fram is back in production.
[2022-05-04 – 13:20] – As a result of the power outage we have some problems with FRAM file system. Slowness/lagging. We are currently working on fixing this and are sorry for the inconvenience this is causing.
[2022-05-06 – 13:25] Fram filesystem is still priodically slow for some users. We assure you that we are continuously working to resolve this issue, but it is hard to debug due to the inconcistancy of the problem.
[UPDATE, 2022-05-13 08:00] The maintenance stop is over. There may still be some file system issues. Please report via regular support channels
[UPDATE, 2022-05-11 08:00] The maintenance stop has now started.
The will be a maintenance of the storage system on the Fram supercomputer. The cluster will be unavailable from the 11th of May at 08:00 until the 12th of May at 20:00.
The queue system configuration of the GPU nodes on Betzy had an error: The number of CPUs were set to 128 instead of 64. Most jobs would probably not be affected by this, but it is possible that some jobs got sub-optimal cpu pinnings.
This has now been fixed, and the documentation updated. There is nothing users have to do with their job scripts (except if they asked for more than 64 cpus per node).
We need to replace an IO module and HDD in Fram storage today.
This should not affect your work and the system will be running as usual during the procedure. If you get any file system issues during this service please send us a support request.
We need to conduct some work on the filesystem controllers for NIRD – TOS. Unfortunately this results in a short unavailability (downtime) period.
All services connected to- and/or utilizing TOS (Tromsø) part of NIRD will be affected. Exported NFS services mounted on FRAM will unfortunately NOT be available either.
The maintenance is set for Thursday 07.04.22 from 09:00-11:00 AM
We are sorry for any inconveniences that may occur. Opslog is updated as soon as the system is back in production.
UPDATE 07-04-2022 – 11:25 … we are still working on the issue and starting to bring the file system up, we hope to back in production soon
UPDATE 07-04-2022 – 12:25 … we are struggling and fighting with the file system, doing our best, we are very sorry for the troubles the issue is causing you
UPDATE 07-04-2022 – 15:35 … the file system is back up and running
Update 01.04.22: The FS performance seems to be normal again.
We experience long login times on the Saga login nodes and poor data access and transfer speed on the whole cluster. We are working on finding the reason and fixing it.
We have identified that the NIRD mount is unavaialble on Saga and Betzy and are working on finding the cause and putting a fix in place.
28-03-2022-13:20 – Mounts should be back now, the problem was caused by Friday’s maintenance on network gear …
We hope that the above has not caused too much frustration for you guys and we would like to wish a very nice day to everyone !
NRIS HPC staff