Major upgrade of Fram from 2023-10-16 08:00 CET to 2023-10-25 15:06 CET
Dear Fram user,
The operating system (OS) upgrade on Fram is now done, and we have opened the cluster for jobs again. Fram now runs Rocky Linux 9.2 as base operative system, and has Slurm version 23.02.
This has been a major upgrade - in fact, all machines in Fram have been reinstalled from scratch. It is therefore inevitable that some things have changed, and there will some things that do not work yet. We will try and fix problems as they occur. Please do report problems you see.
The ssh host keys of the login nodes have been updated. The fingerprints of the updated keys are listed here: https://documentation.sigma2.no/getting_started/fingerprints.html
Because the base OS changed, the software modules have been reinstalled. Note that only the EasyBuild toolchain versions 2021b and newer have been installed, so you might have to update your job scripts to use newer modules. Also note that not all software has been reinstalled yet. Please let us know if you are missing some programs.
As a special case, we now have direct access to both VASP5.4.4 and VASP6.4.2 (no need to load VASPModules or VASPExtra beforehand). The available binaries include standard (vasp_std), noncollinear (vasp_ncl), and gamma point (vasp_gam). Further modules will be introduced shortly. Other modules and adjustments can be provided upon special request.
The change of base OS also means that you have to recompile software that you had compiled before.
The new version of Slurm should mostly work like the old one. One notable change is that if you use “srun” in jobs (submitted with “sbatch” or “salloc”) and you have specified “–cpus-per-task=N” for the job, you have to specify “–cpus-per-task=N” (or “-c N” for short) also on the “srun” command line for the tasks to get access to exactly N cores each. The default is that they get access to all cores the job has on each node. Alternatively, you can use the “–cpu-bind” option to explicitly bind tasks. See “man srun” for details.
Another change is that now both “squeue” and “scancel” have a “–me” switch, which is handy for listing or cancelling your own jobs.
The Login node 3 will be rebooted at 11:00 and the downtime will be just a few minutes.
File access will be unavailable during this event.
This maintenance is now starting as planned.
The maintenance has now started. One of the login nodes, login-3.fram.sigma2.no, is still up and available, so you can access your files on it. But you must log in directly to it with
ssh login-3.fram.sigma2.no or similar.
The operating system (OS) on Fram will be upgraded starting October 16. This is a major upgrade, and will take up to two weeks. During the upgrade, Fram will be unavailable.
After the upgrade, you will most likely have to recompile any software that you have compiled on Fram today. If possible, the login nodes will be made available a little before the rest, so you can start testing and recompiling software.
The main reason for the upgrade is that the current OS is very old, and soon will not receive any more security updates.
During the downtime, we will upgrade the OS on Fram to the latest version of Rocky. (Rocky is a RedHat clone, just like CentOS that Fram is running on today.) We will also upgrade the queue system (Slurm).
The software available via “module load” will be reinstalled. We will install the 2021b toolchain and newer (i.e., the four latest versions). This does mean that not all old versions of all software will be avaialable after the upgrade. If you use software older than this, it would be a good idea to start using newer versions now, if
possible. Please also note that there might be some software packages that will not be reinstalled by the time Fram goes into production. They will be handled as soon as possible.
Also note that the ssh keys on Fram login nodes will change. We will document the new key fingerprints when we open up after the upgrade.
When the upgrade is done, Fram will still run the same “flavour” of Linux (RedHat) as now, so you should be able to work mostly as you do today. But there will be commands and things that have changed or work differently, so you will probably have to do some changes.