Dear Betzy user: The filesystem is yet again almost full. This causes substantial performance degradation.
This is how you can help:
Please have a look on your own or your project files on Betzy and delete all files that are not strictly needed.
Dear Betzy user: The filesystem is yet again almost full. This causes substantial performance degradation.
This is how you can help:
Please have a look on your own or your project files on Betzy and delete all files that are not strictly needed.
Due to a completely full filesystem on Betzy, we have had to resort to changing the age limit on data in the work folder on Betzy. We are now deleting all data that is older than 17 days from the work folders. This problem will persist until we have the nev NIRD up in production. Hopefully we will not have to further move the limit, but be aware that data in the work folder is not backed up, and could potentially be deleted at any point. Important data should therefore be moved to either your home directory or a project folder.
[2023-01-10 15:40]
We have now updated the job statistics that is printed at the end of the slurm-NNN.out files. The output is updated on Saga and Fram, and will be updated on Betzy shortly.
We hope the new output is more readable, understandable and usable.
It is possible to get the same output on the terminal with the command jobstats -j <jobid> (note: this only works for jobs that are finished). The jobstats command also has a –verbose switch which will produce more detailed output, hints and comments (this will be expanded as time goes).
We have tested the changes on all clusters, but errors can happen, so if you spot any errors and/or missing output in your jobs, please let us know.
During the next few days we will test and possibly change some configuration settings for infiniband on Betzy. It should not affect jobs, but some computenodes might become unavailable for short periods of time.
Dear Betzy user. We will update firmware on some of the Betzy storage servers Wednesday (2022-12-14) morning between 08:00 and 10:00. Jobs should not be affected, but you might experience slow access to files a few minutes at a time.
We apologize for the inconvenience.
[UPDATE, 2022-12-19 12:50] The change has been implemented on Saga too now.
2022-12-07 12:50
We have done a small change in the configuration of the queue system on Betzy and Fram now. The change has the effect that if one of the processes started by “srun” in a job fails (for instance due to a segmentation fault), “srun” will now kill the remaining processes of that job step (just like “mpirun” does). Previously, the remaining processes were left running, possibly until the job timed out. This should solve many of the cases where jobs that fail do not get terminated, but continue until they time out.
The same change will be applied to Saga in about two weeks.
The new behaviour is especially useful when combined with having “set -e” or “set -o errexit” earlier in the job script, because then Slurm will terminate the whole job when an “srun” exits due to one of its processes failing.
If one wants the old behaviour of “srun”, one can override the configuration by using “srun –kill-on-bad-exit=0” instead of just “srun”.
2022-11-28
Most of the NRIS staff is busy with an NRIS all-hands meeting this week, so we will have less capacity to handle support issues. But we will try our best to answer questions.
[UPDATE, 2022-11-25 10:00: Maintenance is finished and Betzy will be back in production within 12:00]
[UPDATE, 2022-11-21 09:00: Maintenance has started]
Dear users,
There will be an urgent maintenance stop on Betzy next week, starting Monday 2022-11-21 at 10:00. The stop will last until Thursday 2022-11-24 at 18:00. The maintenance concerns important system upgrades.
We apologize for the short notice on this downtime.
2022-11-11
Because the file system used to store Fram and Betzy project area backups on Saga is full, we have had to stop any further backup of Fram and Betzy project areas. (I.e., /cluster/projects/nnXXXXk on Fram and Betzy).
Already backed up files are still stored, but no new or changed files will be backed up.
The backup will be re-enabled when enough data has been migrated from Saga to the new NIRD storage.
Dear Betzy users,
One of the HPC projects on our infrastructure will be performing a large-scale benchmark on Friday 4 November 09:00 – 19:00. Approximately 1024 nodes will be reserved for this purpose. Once the benchmarking is concluded the machine will resume normal production.