NIRD and Service Platform downtime – 26.06.2019

  • 2019-06-26 14:05: Vendor is meticulously checking each NIRD storage component and decided to replace main controller chassis.
    In the mean time we are applying firmware updates on the Service Platform to improve stability and security.
  • 2019-06-26 08:15: Maintenance has started.

Dear NIRD and Service Platform User,

We have a planned downtime on the 26th of June, Wednesday next week, to replace some defective hardware. Systems will taken offline starting from 08:00AM.

Engineer from storage vendor will assist us from the very first hour.

We expect the maintenance to finish in one day.
Will keep you updated here.

Metacenter Operations

Fram development queue

Dear Fram User,

As of today we have adjusted the queue system policies to facilitate code development and testing on Fram and meanwhile limit possible misuse of devel queue.

devel is now adjusted to allow:

  • max 4 node jobs
  • max 30 minutes wall time
  • max 1 job per user

We have additionally introduced a short queue with following settings:

  • max 10 node jobs
  • max 120 minutes wall time
  • max 2 jobs per user

We will continue to monitor and improve the queue system. Please stay tuned.
You may find more information here.

Metacenter Operations

Fram MDS patched

Dear Fram User,

This morning around 09:05, once again has the Fram metadata server crashed and likely had impact on running jobs.

A mitigating patch was delivered by the vendor yesterday and we used this opportunity to apply it on our metadata servers.

We will keep the system closely monitored and cooperate with the vendor on further stabilizing the system.

Apologies for any inconvenience this may have caused!

Fram MDS crashed

Dear Fram User,

Once again has the Fram metadata server crashed and likely had impact on running jobs.
We are in contact with the storage vendor for patching the file system.

Apologies for the inconvenience!

Clean-up of Fram filesystem needed

Dear Fram users,

The Fram filesystem, and most critically /cluster/work and /cluster/home, is running out of inodes, and there is only 8% left. If we run out of inodes it will not be possible to create new files. To avoid loss of data and job crashes we kindly ask all of you to if possible delete files that you no longer need.

Best regards,

Jon, on behalf of the Operations Team

stallo downtime april 9-11th.

2019-04-11 10:30 Update, stallo is now back in service again.

2019-04-02: 16:35 Due to work on th electrical infrastructure in the building housing stallo we need to shutdown stallo in the period April 9-11. No new jobs are allowed to start that is not finished by april 9th 08:00.