The maximum allowed size of job arrays is 1,000 array tasks on our clusters. However, we have now increased the maximal allowd array task ID to 100,000. Thus this is now allowed: “–array=900-1100” (but “–array=1-1100” is still not).
NVIDIA Drivers on the GPU nodes on Betzy will be upgraded from version 465.19.01 (CUDA 11.3) to version 535.129.03 (CUDA 12.2). The upgrade will take place as nodes become available, without a maintenance stop. We expect most software to continue working without issues. Should you encounter any problems or incompatibilities on the new installation, please contact the support for options.
Due to a conference between 28-30 November we have limited support resources, and users might experience longer response time to their support requests. We apologies for this inconvenience, and will follow up quicker after the conference.
When submitting jobs, if the project has less than 10 % left of its cpu hour quota, sbatch/salloc will now print a warning. Hopefully this will help projects avoid running out of quota.
It is possible to disable the warning, if needed, by setting the environment variable SLURM_SUBMIT_SUPPRESS_QUOTA_WARNING
to 1
in your shell (for instance in one of the Bash startup files). It is also possible to disable all warnings we add when submitting jobs by setting SLURM_SUBMIT_SUPPRESS_WARNINGS
to 1
...
Slurm has been upgraded on all clusters. On Saga, it was a minor upgrade, whereas on Fram and Betzy, it was a major version upgrade.
The new versions should mostly work like the old one, with one notable change:
“srun” no longer inherits “–cpus-per-task=n” from “sbatch”. This means that if you submit a job with “sbatch –cpus-per-task=n” or “#SBATCH –cpus-per-task=n”, you now must run “srun” with “srun –cpus-per-task=n …” or put “export SRUN_CPUS_PER_TASK=$SLURM_CPUS_PER_TASK” in the job...
NRIS bi-annual workshop is going on from June 6-8, 2023.
The response/resolution time of tickets will be affected due to the event.
We have an all-hands meeting this week. That means that the response time on support tickets will be reduced.
Due to a problem with the new modules we have reverted to the previous modules. So please ignore the commands below and use module command as usual.
Sorry for the inconvenience.
One rack on Fram is out of service due to a water leakage. This means that the overall capacity of nodes is a bit reduced for the time being.
Until this is resolved the overall node capacity of Fram is reduced by 60 nodes.
Sorry for the inconvenience this may cause.
-Infra Team