Major upgrade of Saga from 2023-04-17 08:00 CEST to 2023-04-29 11:41 CEST

Scheduled maintenance High Performance Computing Saga
2023-04-17 08:00 CEST · 1 week, 5 days, 3 hours, 41 minutes

Updates

Resolved

Dear Saga user,

Saga has now been upgraded and put back into production.  You should
be able to log in and start working on Saga again.

Please note that the ssh host keys on the login nodes have changed. This means that you may get a warning when logging in. The new fingerprints are:

ed25519: SHA256:YOkZ1uudXrFmaigdnpZ64z497ZccNhdZe/abFkDXOH8 or MD5:2b:c2:ce:c0:f1:b8:0a:95:ec:db:b4:f3:fb:ee:e9:70

ecdsa: SHA256:qirKlTjO9QSXuCAiuDQeDPqq+jorMFarCW+0qhpaAEA or MD5:13:4e:ae:66:89:0d:24:27:b8:15:87:24:31:ed:32:af

rsa: SHA256:mY+Po9LKAlZGzMRHUmq1abrSOohifdN7+5VUmRTW4tE or MD5:61:e4:49:4b:4e:00:14:2d:9d:b9:ac:99:c2:16:e6:ab

The documentation will be updated shortly.
If you are using strict host key checking you can delete the old Saga keys from your local known_hosts file (e.g ~/.ssh/known_hosts) before logging in again.

Saga now runs Rocky Linux 9.1 as base operative system, and has Slurm
version 22.05.

This has been a major upgrade - in fact, all machines in Saga have
been reinstalled from scratch.  It is therefore inevitable that some
things have changed, and there will some things that do not work yet.
We will try and fix problems as they occur.  Please do report problems
you see unless they are already mentioned in the operations log
(https://opslog.sigma2.no/).

Because the base OS changed, the software modules have been
reinstalled.  Note that only the EasyBuild toolchain versions 2021a
and newer have been installed, so you might have to update your job
scripts to use newer modules.  Also note that not all software has
been installed yet.  Please let us know if you are missing some
programs.

The change of base OS also means that you have to recompile software
that you had compiled before.

The new version of Slurm should mostly work like the old one.  One
notable change is that if you use “srun” in jobs (submitted with
sbatch” or “salloc”) and you have specified “—cpus-per-task=N” (N >1) for the job, you have to specify “—cpus-per-task=N” (or “-c N” for
short) also on the “srun” command line for the tasks to get access to
exactly N cores each.  The default is that they get access to all
cores the job has on each node.  Alternatively, you can use the
—cpu-bind” option to explicitly bind tasks.  See “man srun” for
details.

Another change is that now both “squeue” and “scancel” have a “—me
switch, which is handy for listing or cancelling your own jobs.

April 29, 2023 · 11:41 CEST
Update

The maintenance has been progressing very well so far and we are almost ready to put Saga back to full production, however there are still tasks that require some final work.

Since 1 May is Labor Day we are extending the maintenance window until 2 May 16:00. Sorry about the inconvenience.

April 28, 2023 · 22:20 CEST
Update

Issues with accessing the temporary login node (login-tmp.saga.sigma2.no) should be solved now. You can use this to access project data via the path /nird/projects while the maintenance is going on.

April 17, 2023 · 15:09 CEST
Update

We are aware that some users are getting Permission denied when trying to access the temporary login node on Saga (login-tmp.saga.sigma2.no). This is currently being investigated and hopefully a fix will be applied shortly. Another update will be posted once we have confirmed that it is working as expected.

April 17, 2023 · 12:47 CEST
Update

We have set up a temporary login node for this maintenance in order to allow read-only access to project data from Saga.

It can be accessed using SSH and hostname login-tmp.saga.sigma2.no. Data is stored under the path /nird/projects.

This node shall only be used to access data. Please do not perform any computations using it.

April 17, 2023 · 08:59 CEST
Started

This maintenance is now starting as planned.

April 17, 2023 · 08:00 CEST
Scheduled

The operating system (OS) on Saga will be upgraded starting April 17 at 08:00. This is a major upgrade, and will take up to two weeks. During the upgrade, Saga will be unavailable.

After the upgrade, you will most likely have to recompile any software that you have compiled on Saga today. If possible, the login nodes will be made available a little before the rest, so you can start testing and recompiling software.

The main reason for the upgrade is that the current OS is very old, and soon will not receive any more security updates.

During the downtime, we will upgrade the OS on Saga to the latest version of Rocky. (Rocky is a RedHat clone, just like CentOS that Saga is running on today.) We will also upgrade the file system (BeeGFS), and the queue system (Slurm).

The software available via “module load” will be reinstalled. We will install the 2021a toolchain and newer.

When the upgrade is done, Saga will still run the same “flavour” of Linux (RedHat) as now, so you should be able to work mostly as you do today. But there will be commands and things that have changed or work differently, so you will probably have to do some changes.

March 20, 2023 · 08:35 CEST

← Back