Deletion of data from work folder on Betzy

Due to a completely full filesystem on Betzy, we have had to resort to changing the age limit on data in the work folder on Betzy. We are now deleting all data that is older than 17 days from the work folders. This problem will persist until we have the nev NIRD up in production. Hopefully we will not have to further move the limit, but be aware that data in the work folder is not backed up, and could potentially be deleted at any point. Important data should therefore be moved to either your home directory or a project folder.

Updated job statistics in slurm-NNN.out

[2023-01-10 15:40]

We have now updated the job statistics that is printed at the end of the slurm-NNN.out files. The output is updated on Saga and Fram, and will be updated on Betzy shortly.

We hope the new output is more readable, understandable and usable.

It is possible to get the same output on the terminal with the command jobstats -j <jobid> (note: this only works for jobs that are finished). The jobstats command also has a –verbose switch which will produce more detailed output, hints and comments (this will be expanded as time goes).

We have tested the changes on all clusters, but errors can happen, so if you spot any errors and/or missing output in your jobs, please let us know.

Small config change in queue system

[UPDATE, 2022-12-19 12:50] The change has been implemented on Saga too now.

2022-12-07 12:50

We have done a small change in the configuration of the queue system on Betzy and Fram now. The change has the effect that if one of the processes started by “srun” in a job fails (for instance due to a segmentation fault), “srun” will now kill the remaining processes of that job step (just like “mpirun” does). Previously, the remaining processes were left running, possibly until the job timed out. This should solve many of the cases where jobs that fail do not get terminated, but continue until they time out.

The same change will be applied to Saga in about two weeks.

The new behaviour is especially useful when combined with having “set -e” or “set -o errexit” earlier in the job script, because then Slurm will terminate the whole job when an “srun” exits due to one of its processes failing.

If one wants the old behaviour of “srun”, one can override the configuration by using “srun –kill-on-bad-exit=0” instead of just “srun”.

Urgent maintenance stop on Betzy from 2022-11-21

[UPDATE, 2022-11-25 10:00: Maintenance is finished and Betzy will be back in production within 12:00]

[UPDATE, 2022-11-21 09:00: Maintenance has started]

Dear users,

There will be an urgent maintenance stop on Betzy next week, starting Monday 2022-11-21 at 10:00. The stop will last until Thursday 2022-11-24 at 18:00. The maintenance concerns important system upgrades.

We apologize for the short notice on this downtime.

Backup of Fram and Betzy project areas stopped

2022-11-11

Because the file system used to store Fram and Betzy project area backups on Saga is full, we have had to stop any further backup of Fram and Betzy project areas. (I.e., /cluster/projects/nnXXXXk on Fram and Betzy).

Already backed up files are still stored, but no new or changed files will be backed up.

The backup will be re-enabled when enough data has been migrated from Saga to the new NIRD storage.