Planned maintenance on 12th of June

Update

  • 2018-06-12 17:16 Access it NIRD is reopened now.
  • 2018-06-12 16:05 Service are started and back in production on NIRD Service Platform.
  • 2018-06-12 15:55 Queue reservation is now removed and jobs are running on part of Fram. Rest of the nodes will be added back to the queue as soon as they are updated.
  • 2018-06-12 14:55 Access is re-opened to Fram. Queue reservation is still in place.
  • 2018-06-12 08:34 Maintenance has started.

Dear Fram and NIRD user,

We will have a one day planned maintenance on 12th of June starting from 08:30 AM.
Fram, NIRD and the Service Platform will be affected. One storage enclosure must be replaced, needing downtime for the file systems served from NIRD.

There is a system reservation in place on Fram starting on 12.06.2018 08:45 AM. Jobs not being able to finish before the maintenance window, will be left pending in the queue with a Reason “ReqNodeNotAvail” and will be started when the maintenance is over.

We will keep you updated via OpsLog/Twitter.

Thank you for your consideration!
Metacenter Operations

2 days downtime starting on 25th of April

Update:

  • 2018-04-30 14:46 File system issues are solved now on Fram and access is reopened. Jobs are temporarily on hold due to some troubles with the cooling system in the server room. As soon as that is sorted out, jobs will be permitted again.
  • 2018-04-30 10:15 We are still struggling with the /cluster file system. The problem is escalated to the Vendor. At the moment we do not have a time estimate when Fram is back online, but there is work in progress to fix this as soon as possible, hopefully during the day.
  • 2018-04-27 18:44 Unfortunately there are still problems taking up the Lustre file system on Fram. Issue is caused by an incompatibility hitting routing between IB networks/fabrics on the Lustre object storage servers. The vendor is now planning and working to carry out an emergency update on the system. We are sorry for the trouble.
  • 2018-04-27 16:49 Access to NIRD is reopened now.
  • 2018-04-26 22:50 We are having problems on taking up the Lustre file system on Fram. The issue is reported to the vendor. Additionally, there are some minor issues which must be addressed on NIRD before opening it for production, but we expect reopening the access to both Fram and NIRD during tomorrow.

 

Dear Fram and NIRD user,

A two day downtime is scheduled for week 17. The scheduled maintenance will start on Wednesday, 25th of April, at 09:00 AM and will affect Fram, NIRD and the Service Platform.

During this time we will:
1. Extend NIRD storage space with ~1.1PB.
– The new hardware will be coupled to NIRD and extra disks loaded to the system during these two days.
– Please note that the above advertised storage will not be available at once. Storage space is gradually added as soon as loaded disks are formatted and available to the file system.
– One of our top priorities is to address the inode shortage on $HOME areas.
2. Address file system related bugs on NIRD by upgrading the afferent software and tune some parameters on the servers.
3. Fix broken hardware on Fram.
4. Apply any outstanding patches to both Fram and NIRD.
5. Carry out maintenance work on the cooling system for Fram.

There is a job reservation in place on Fram starting on 08:45 AM 25th of April.  Jobs that cannot complete before that time, will be left pending in the queue with a Reason “ReqNodeNotAvail” and an estimated start time of 2154.  They will be started when the maintenance is over.

We will keep you updated via OpsLog/Twitter.

Thank you for your consideration!
Metacenter Operations