–gpus-per-task not working correctly on Saga

We have recently discovered that using ‘–gpus-per-task’ on Saga leads to wrong accounting within the Slurm system. This has two effects, first the job will not be scheduled as quickly as at should, because Slurm thinks the job will require more resources than it asks for. Secondly, the job will actually be deducted more project hours than it should.

This is a bug in the Slurm batch system which we are trying to fix as quickly as possible.

For now, we recommend all GPU users to revert to ‘–gpus’ or ‘–gpus-per-node’ which we have ensured behaves as they should.

[Updated] Batch system issue on Betzy

There is currently an issue on Betzy with the batch system which results in jobs not completing and new jobs not being started.

We are currently investigating the issue and will update once we know what caused it and how it can be resolved.

[Update 14:22]: Job submission is working again. The users experiencing this were unfortunately victims of a batch system restart which happened at the same time as the job was submitted.