[2021-06-25 08:45] The maintenance stop is now over, and Saga is back in full production. There is a new version of Slurm (20.11.7), and storage on /cluster has been reorganised. This should be largely invisible, except that we will simplify the dusage command output to only show one set of quotas (pool 1).
[2021-06-25 08:15] Part of the file system reorganisation took longer than anticipated, but we will start putting Saga back into production now.
[2021-06-23 12:00] The maintenance has now started.
[UPDATE: The correct dates are June 23–24, not July]
There will be a maintenance stop of Saga starting June 23 at 12:00. The stop is planned to last until late June 24.
During the stop, the queue system Slurm will be upgraded to the latest version, and the /cluster file system storage will be reorganised so all user files will be in one storage pool. This will simplify disk quotas.
All compute nodes and login nodes will be shut down during this time, and no jobs will be running during this period. Submitted jobs estimated to run into the downtime reservation will be held in queue.
[UPDATE, 2021-06-08 08:00] Betzy is now up and in production again.
[UPDATE] Unfortunately, the downtime is taking longer than anticipated, and will not be finished tonight. We plan on getting Betzy up again at around 08:00 tomorrow morning.
Campusservice at NTNU will conduct maintenance on the High Voltage circuits for Non-redundant power on 7th of June 2021, between 15:00 and 20:00. All compute nodes and login nodes will be shut down during this time, and no jobs will be running during this period. Submitted jobs estimated to run into the downtime reservation will be held in queue.
Update: The file system servers have now been fixed, and we are back online again. Thank you for your patience.
We have an ongoing performance issue with Fram filesystem. We need to shut down file servers to get this fixed, and therefore need to have three hours downtime:
Wednesday 20th January between 12:00 and 15:00, Fram will be unavailable
We are going to expand the storage on Saga. This will happen during week 50, between 7th and 11th December. Hopefully this will give oss a few Petabytes extra and enough storage for the lifetime of the system.
As of today, Wednesday 4th, November at 08:00, Fram is down for maintenance. We will do the same exercise as on NIRD-TOS, namely change all internal cables on the storage system.
17:20 NIRD-TOS and services are now up.
Dear NIRD and NIRD Toolkit user: NIRD-TOS is currently down and will remain unavailable until Wednesday 12:00. We are replacing all cables during the next coupe of days.
Note that NIRD-home is also not available during that time.
All remote mounts on Fram, Saga and Betzy using NIRD-TOS will be unavailable until downtime is over
We will have downtime the following week to try again to replace all internal cables in NIRD-TOS and Fram storage systems.
NIRD-TOS (Including the toolkit) will be down from 08:00 Monday 2nd November to wednesday 4th 12:00
Fram will be down from Wednesday 4th 08:00 until Friday 6th 12:00
There is still a chance that the downtime will not happen, but proper notification will be given in the opslog. Unfortunately the current situation with Covid-19 makes it difficult to make detailed plans.
We apologize for any inconvenience.
The downtime for NIRD-TOS on 26th October until 29th October is cancelled and the downtime for Fram from 28th October until 29th of October is cancelled.
New dates for the downtime will be announced monday 26th or tuesday 27th.
During the downtime we will replace all internal cables between disk controllers and disk enclosures. The firmware upgrade two weeks ago helped a lot, but we are still seeing ccommunication errors so the decision is to remove all cables and replace them.
There will be a scheduled downtime for Fram and NIRD during week 42.
The downtime will start on Monday 12th at 08:00 and last until Thursday 15th at 15:00. Parts of the systems may be available during this period
The Downtime on monday is primarily for testing UPS and power facility in Fram datacenter, but we will at the same time start installing new firmware for all storage systems in Tromsø and NIRD Trondheim.
We need to take down Stallo for work on building infrastructure. Downtime will be from Tue June 2nd 12:00 until no later than Thu June 4th 12:00. We apologize for the inconvenience.