Vilje: /work filesystem is partially down.

Dear Vilje cluster users:

/work filesystem on Vilje is partially down. we are working on it.

At the moment it’s very difficult to determine when we can take /work filesystem fully back online.
We will keep you posted.

Best Regards

Fram down

 

2019-11-13-16:15 Fram is up and running again. 

One of the cooling units stoped, causing the other to also stop and all compute nodes went down. 

 

Dear Fram User,

Fram is currently down likely due to issues with the cooling distribution unit.
We are currently investigating the issue and working on placing Fram back into production.

Apologies for the inconvenience!

Metacenter Operations

Slurm upgraded on Saga

Slurm was upgraded to the latest version (19.05.3-2) on Saga today. This includes a fix for the problem with using “srun” for running interactive jobs.

Please let us know if you notice anything that has gone wrong after the upgrade.

Enforcing standard quota on $HOME

Dear Fram and Saga user,

As you may know, we have a standard 20GB block quota on $HOME on Fram and Saga HPC resources. This was however not enforced until now, but due to frequent overuse and backup limitations, we are compelled to do it now and will start to be in effect starting on 04.11.2019.

Any project related data shall be moved to /cluster/projects area and unneeded data shall be removed.

We have also implemented a new policy with regards to backups and any files placed under $HOME/nobackup or $HOME/tmp will be excluded.

For more information, please check the documentation pages at https://documentation.sigma2.no/.

Thank you for your understanding!
Metacenter Operations

Saga: problems with logging in

We are currently experiencing problems with the /cluster file system on Saga. This prevents users from logging in.

We are investigating, and will update here when we know more.

Update: 11:30 we have identified and solved the problem, now /cluster filesystem is back online.

Planned maintenance on Fram on 16.10.2019

Update:

  • 2019-10-18 14:36 We are ready with the reinstallation, configuration checks, QA and tests. Access to the machine has been reopened and queued jobs are running again.
  • 2019-10-18 06:12 Reinstallation of compute nodes is much slower then anticipated and thus re-opening of the machine is delayed. We do our best to finish the maintenance as soon as possible. In parallel we are conducting tests and benchmarks.
    Will keep you updated.
  • 2019-10-17 08:25 File system servers and infrastructure switches were patched yesterday.
    We are proceeding now with the upgrade of the service and the login nodes.
  • 2019-10-16 08:07 Maintenance has started.

Dear Fram User,

We will have a two days planned downtime starting from 08:00AM on the 16th of October for maintenance on the storage and the file system.

During this time we will, together with the vendor, upgrade the storage firmwares, upgrade the software on the /cluster file system servers and upgrade the operating system on Fram.

This upgrade is necessary to fix the frequent issues with the metadata servers and enhance stability and security of the system.

Fram jobs which can not finish by the 16th of October, are queued up and will not start until the maintenance is finished.

Thank you for your consideration!

Metacenter Operations