[Resolved] UiB MATLAB License server is down

Update 2021-05-10: The UiB MATLAB license server is now up and running again.

Dear users,
We have problem with UiB MATLAB license server, the license server is not stable and crashing from time to time, Users using MATLAB software from different clusters will have problem to contact UiB MATLAB license server.

we are working on this issue, and will keep you updated.

We apologise for any inconvenience caused.

Best Regards

[SOLVED] Betzy downtime May 11,2021

UPDATE: The maintenance stop went well, and Betzy is back in production again.

Dear Betzy users,
We will have planned downtime at 11.05.2021, from 09:00 to 15:00. During this time we will expand storage system on Betzy. All compute nodes are reserved, submitted jobs which will not be able to finish before the downtime will not start.

Please contact us if you have any question.

Best Regards

Support team.

Fram: compute nodes are down

Dear Fram users,

We have problem with Fram compute nodes, there are about 870 nodes is down due to unknown reason, we are working on the issue, and will keep you updated.

Update 2020-12-22, 20:05: Most of the compute nodes have now been brought back online. There are still a few nodes that needs more checking before being made available for jobs.

Update 2020-12-22, 18:04: The cooling system has been stable for the last hour after making some adjustments together with the vendor. We are slowly bringing up the nodes.

Update 2020-12-22, 16:01: In order to keep the cooling as stable as possible, we have decided to take down all high memory nodes. This way we can keep some of the normal compute nodes up for the time being. We are also working together with the vendor to make adjustments on the cooling system to ensure continued stability.

We are very sorry about the inconvenience.

Update 2020-12-22, 13:41: We have identified the cause to be the cooling system and are working on mitigating the issues. Most of the compute nodes must remain down while doing so, unfortunately.

Update 2020-12-24 10:30: Compute nodes shutdown again due to electrical problems in machine room, problem has been resolved according to machine room service department, we are working to take up all nodes.

Update 2020-12-24 12:10: Most of the compute nodes on Fram is back online.

Please cleanup data on Saga!

Dear users on Saga,

currently, usage on Saga’s parallel file system (everything under /cluster) is at about 93%. Already, some of the file system servers are not accepting new data. If usage increases even further, soon the performance of the parallel file system may drop significantly, then some users may experience data loss and finally the whole cluster may come to a complete halt.


Therefore, we are kindly asking all users with large usage (check with the command dusage) to cleanup unneeded data. Please, check all storage locations you’re storing data, that is, $HOME$USERWORK, project folders (/cluster/projects/...) and shared folders (/cluster/shared/...). Particularly, we’re asking users whose $HOME quota is not (yet) enforced (see line with $HOME in example below) to reduce their usage as soon as possible. Quota for $HOME if set is 20 GiB.

[saerda@login-3.SAGA ~]$ dusage -u saerda
Block quota usage on: SAGA
File system   User/Group   Usage   SoftLimit     HardLimit 
saerda_g  $HOME             6.9 TiB 0 Bytes     0 Bytes
saerda    saerda (u)        2.8 GiB    0 Bytes       0 Bytes

In parallel, we are trying to help users to reduce their usage and to increase the capacity of the file system, but these measures usually take time.

Many thanks in advance!

Security alert: Please update your SSH keys

There is an ongoing attack against academic HPC centers in Europe right now, and several clusters and storage systems have been compromised. The attackers have used stolen credentials (passwords and/or SSH keys) to get into systems. We are investigating whether any of our systems are affected. In the mean time we encourage everyone to create new SSH keys. See https://documentation.sigma2.no/getting_started/create_ssh_keys.html for a description of how to create SSH keys. Please do set a passphrase on the keys, so they will be worthless if they should be stolen. Please also remember to remove old SSH keys from your ~/.ssh/authorized_keys file on each system (NIRD, Fram, Saga, Stallo, Vilje), so noone can use the old keys any more.

Fram: Lustre quota problem.

Dear Fram users,
We still have lustre quota problem on Fram cluster where “dusage” command may give you inaccurate numbers.
To eliminate this issue we need downtime which will take about 4 hours.

Date for downtime is not decided, we will give you an update as soon as we have more information.

Meanwhile if you have any problem related to quota on Fram please contact us.

Fram: Interconnect network manager crashed

Dear Fram users:

Fram interconnect network manager crashed yesterday at 15:34, which caused all compute nodes had degraded routing information. This can cause Slurm jobs crash with a communication error.
Interconnect network manager is running again, and all compute nodes have the latest routing information, and communication between the compute nodes are restored.
We apologize for the inconvenience if you have any question please don’t hesitate to contact support.