Clean-up of Fram filesystem needed

Dear Fram users,

The Fram filesystem, and most critically /cluster/work and /cluster/home, is running out of inodes, and there is only 8% left. If we run out of inodes it will not be possible to create new files. To avoid loss of data and job crashes we kindly ask all of you to if possible delete files that you no longer need.

Best regards,

Jon, on behalf of the Operations Team

stallo downtime april 9-11th.

2019-04-11 10:30 Update, stallo is now back in service again.

2019-04-02: 16:35 Due to work on th electrical infrastructure in the building housing stallo we need to shutdown stallo in the period April 9-11. No new jobs are allowed to start that is not finished by april 9th 08:00.

Missing mounts for Project NS2345K on NIRD

Missing mounts for Project NS2345K on NIRD
The NFS server which is exporting /tos-project1/NS2345K/FRAM to NIRD/FRAM has crashed yesterday around 16:00. We have recovered the NFS server and /tos-project1/NS2345K/FRAM is re-exported agian. We are currently working on mounting the filesystem in login containers and meanwhile investigating the cause. We are sorry for the inconveniences caused.

Scheduled downtime – NIRD storage expansion – 2nd of April

Update:

  • 2019-04-03 13:25: NIRD and the service platform are back into production.
  • 2019-04-03 10:59: Maintenance work has finished. We are proceeding in starting back the filesystems and services.
  • 2019-04-03 08:22: Disk expansion and rebalancing is finished. HW checks are currently ongoing and shall finish in a couple of hours. Will keep you posted.
  • 2019-04-02 09:55: NIRD filesystems are unmounted from Fram and replicated data is available read-only trough login-trd.nird.sigma2.no
  • 2019-04-02 08:06: Maintenance work has started.

Dear NIRD User,

NIRD and the Service Platform will be under maintenance to expand the disk capacity in Tromsø.

The operations for storage expansion and disk pool rebalancing will start on the 2nd of April at 8:00 am CET and will last for maximum 2 days. During the maintenance, the services running on the NIRD Service Platform and on the NIRD Toolkit will not be available.

During the downtime we plan to make project data mirrored to Trondheim available in read-only mode trough a specially built login node. This solution will be first tested with real load during this downtime, thus we might encounter some technical difficulties.
That being said, to access the remote, mirrored data, please login to login-trd.nird.sigma2.no.

We apologise for the inconvenience.
Metacenter Operations

mds crash on fram

2019-03-19 17:55 Secondary MDS server for lustre filesystem crashed between 17:15 and 17:45, And primary MDS server took over and restored filesystem around 17:45 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.

MDS crash 13.03.2019

Main MDS server for lustre filesystem crashed between 14:00 and 14:30, And secondary MDS server took over and restored filesystem around 14:40 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.