stallo downtime april 9-11th.

2019-04-11 10:30 Update, stallo is now back in service again.

2019-04-02: 16:35 Due to work on th electrical infrastructure in the building housing stallo we need to shutdown stallo in the period April 9-11. No new jobs are allowed to start that is not finished by april 9th 08:00.

Missing mounts for Project NS2345K on NIRD

Missing mounts for Project NS2345K on NIRD
The NFS server which is exporting /tos-project1/NS2345K/FRAM to NIRD/FRAM has crashed yesterday around 16:00. We have recovered the NFS server and /tos-project1/NS2345K/FRAM is re-exported agian. We are currently working on mounting the filesystem in login containers and meanwhile investigating the cause. We are sorry for the inconveniences caused.

Scheduled downtime – NIRD storage expansion – 2nd of April

Update:

  • 2019-04-03 13:25: NIRD and the service platform are back into production.
  • 2019-04-03 10:59: Maintenance work has finished. We are proceeding in starting back the filesystems and services.
  • 2019-04-03 08:22: Disk expansion and rebalancing is finished. HW checks are currently ongoing and shall finish in a couple of hours. Will keep you posted.
  • 2019-04-02 09:55: NIRD filesystems are unmounted from Fram and replicated data is available read-only trough login-trd.nird.sigma2.no
  • 2019-04-02 08:06: Maintenance work has started.

Dear NIRD User,

NIRD and the Service Platform will be under maintenance to expand the disk capacity in Tromsø.

The operations for storage expansion and disk pool rebalancing will start on the 2nd of April at 8:00 am CET and will last for maximum 2 days. During the maintenance, the services running on the NIRD Service Platform and on the NIRD Toolkit will not be available.

During the downtime we plan to make project data mirrored to Trondheim available in read-only mode trough a specially built login node. This solution will be first tested with real load during this downtime, thus we might encounter some technical difficulties.
That being said, to access the remote, mirrored data, please login to login-trd.nird.sigma2.no.

We apologise for the inconvenience.
Metacenter Operations

mds crash on fram

2019-03-19 17:55 Secondary MDS server for lustre filesystem crashed between 17:15 and 17:45, And primary MDS server took over and restored filesystem around 17:45 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.

MDS crash 13.03.2019

Main MDS server for lustre filesystem crashed between 14:00 and 14:30, And secondary MDS server took over and restored filesystem around 14:40 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.

Short outage on Service Platform

Servers running the Service Platform and NIRD login nodes, have some issues with the remote filesystems.
The problem is already identified and being taken care of, but you might experience short hiccups until the problem is fixed on all the affected nodes.
You will have to re-login to NIRD login nodes.

We expect to be ready in maximum one hour.