Maintenance stop to extend the GPU partition from 2026-02-25 07:00 CET to 2026-02-27 14:46 CET
Updates
Dear Users, the Olivia GPU partition has been extended and the system has been opened for production.
Have a great weekend!
/NRIS Team
The upgrade has gone as planned, but the remaining benchmarks are taking longer than anticipated.
We believe they should complete within the day, and are therefore extending the downtime by 4 hours (until 22:00).
Apologies for the inconvenience!
/NRIS Team
There will be a three-day downtime of Olivia February 25 – 27 to extend the “accel” partition with new GPU nodes. After the downtime, Olivia will have 36 new GPU nodes.
Also during the downtime, there will be a change in the CPU partition (“normal”):
Currently, Olivia has two partitions, “normal” (the default) for CPU jobs
and “accel” for GPU jobs.
There are problems with running multiple jobs on the same node, especially jobs that run on more than one node. This is mainly a problem on the “normal” partition, not the “accel” partition. We are working with the vendor to try and overcome these problems. For now, we do not have a time estimate on when this can be fixed.
As a remedy, the “normal” partition on Olivia will be split into two: “small” and “large”. The “small” partition is for jobs that need less than 256 CPUs, and the “large” partition is for jobs requiring at least 256 CPUs. The “normal” partition will be removed.
The “small” partition will work as the “normal” partition on Olivia does today: jobs ask for CPUs (–ntasks, –cpus-per-task, etc) and memory (–mem-per-cpu, etc.), and more than one job can run on the same node.
It will also be the default partition (like “normal” is today), so if no –partition is specified, the job will go into the “small” partition.
The difference is that the “small” partition will only accept jobs that run on a single node, so one cannot ask for more than 256 CPUs or more than one node.
The “large” partition will work like the “normal” partition on Betzy does (and Fram did): jobs only ask for one or more nodes (–nodes), and will get whole nodes and run alone (exclusive) on their nodes. Jobs get access to all CPUs and all the memory on the nodes, and it is not necessary or possible to specify –mem-per-cpu or similar.
Even if the job specifies –ntasks-per-node less than 256 (which is useful in combination with –cpus-per-task), the job will get access to and be accounted for all the CPUs on the nodes.
The “accel” partition is not affected by this change.
The user documentation will be updated when the partitions are changed.
← Back