Scheduled downtime on the 12th of February

Update:

  • 2019-02-15 11:18: We are still experiencing problems with the storage system on Fram. Disks begun to mass-fail once again after the system seemed to be stable during the night. We are depending on the vendor to resolve these issues and we are working closely with them.
    Based on the new instability we can not give an estimate for when the system will be ready for general use again. This is an unfortunate situation and we understand the impact on you, and thus we try all possible solutions to keep your data safe and bring up the system as soon as possible.

    The OpsLog will be updated with new information when the status of the situation changes.

  • 2019-02-14 13:17: Due to missing parts, and the size of the storage, disk recovery is progressing slowly ahead on approximately 50% reduced performance. Current ETA are:
    • Fram: 15.02.2019
    • NIRD: 19.02.2019
    • Service Plattform: 19.02.2019
  • 2019-02-13 19:07: Communication with the missing storage enclosures were re-established and disk pools are rebuilding at this time. Unfortunately we can not reopen machines until disk pools are stabilized. We will have a new round of checks and risk analysis tomorrow morning. Will keep you updated here.
  • 2019-02-13 11:33: Some of the parts arrived to the datacenter and we are working with the vendor on replacing and pathing the firmware on Fram. More details to follow as we know more.
  • 2019-02-12 15:38: NIRD Tromsø and Fram storages have each one disk enclosure which failed. We are waiting for replacement parts to arrive. After replacement we will have to rebuild disk pools before re-opening machines for production.Current estimate is tomorrow evening. Will keep you updated.
  • 2019-02-12 12:36: Firmware upgrade on NIRD is finished. We are proceeding to start back NIRD services. Will keep you posted.
  • 2019-02-12 08:17: Maintenance has started.
  • 2019-02-11 13:20: Due to the disk problems accellerating during the weekend, we have now changed the maintenance stop reservation so no new jobs will start until the maintenance is done.  Already running jobs will not be affected, but no new jobs will start.  This has been done to reduce the risk of data loss.

We need to have a scheduled downtime on a relatively short notice in order to upgrade the firmware on both Fram and NIRD (including NIRD Toolkit) storages.
This is a critical and mandatory update which will increase stability, performance and reliability of our systems.

The downtime is expected to last no more than a working day.

Fram jobs which can not finish by the 12th of February, are queued up and will not start until the maintenance is finished.

Thank you for your understanding!
Metacenter Operations