FRAM – critical storage issue

UPDATE:

  • 2020-03-12 10:45: Maintenance is finished now and faulty components were replaced. We continue to monitor the storage system.
    Thank you for your understanding.
  • 2020-03-11 10:16: We have to replace one hardware module on the Fram storage system. The maintenance will be carried out keeping the system online. However there will be some short, up to 5 minutes, hiccup while we are failing over components on the redundant path, possibly causing some jobs to crash.
  • 2020-03-05 20:30: Maintenance is over, Fram is online. Jobs that were running before the maintenance may have been re-queued. It’s also possible that some of the jobs were killed, we are sorry for that. if this is the case, you have to resubmit your job.

Dear FRAM users,

We are facing a major issue with FRAM’s storage system. The necessary tasks are being performed to mitigate the issue. We will have to take the whole machine offline to be able to perform the above mentioned tasks.