mds crash on fram

2019-03-19 17:55 Secondary MDS server for lustre filesystem crashed between 17:15 and 17:45, And primary MDS server took over and restored filesystem around 17:45 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.

MDS crash 13.03.2019

Main MDS server for lustre filesystem crashed between 14:00 and 14:30, And secondary MDS server took over and restored filesystem around 14:40 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.

Short outage on Service Platform

Servers running the Service Platform and NIRD login nodes, have some issues with the remote filesystems.
The problem is already identified and being taken care of, but you might experience short hiccups until the problem is fixed on all the affected nodes.
You will have to re-login to NIRD login nodes.

We expect to be ready in maximum one hour.

Fram is in service again

Dear Fram users,

after more than 16 days of downtime, losing more than 12 million CPU hours, we were finally ready to return the system to service tonight.


This downtime has been a very unpleasant experience to us, and we sincerely understand that this has been annoying and causing distress to our users depending on the service.

The main reason for the downtime has been severe problems with the global file system on Fram, forcing us to halt the system and escalate towards the file system vendor until their engineers were able to analyse and repair the different issues experienced.

Sincerely, Jørn Amundsen, UNINETT Sigma2 AS

NIRD available again

Dear NIRD and NIRD Toolkit User,

After a prolonged downtime due to system failures beyond our control and field of responsibility, access to NIRD is finally reopened.
The vendor has replaced the failing hardware and we are finally back online. Some disk pools are still under rebuild and should be finished in few hours. Until then, you might encounter slight performance loss.

We will proceed in taking up the Service Platform during today.

Thank you for your understanding and patience!
Metacenter Operations