[FINISHED] Short 1 hour downtime on Fram, 24th November 12:00

[Update, 2021-11-24 14:15] Now the NIRD mounts are working again.

[Update, 2021-11-24 13:30] We are back in production and jobs are running som normal again. We are missing the NIRD mounts on two of the login nodes, but are working on fixing that.

[Update, 2021-11-24 12:00] The maintenance has started now.

There will be a short 1 hour downtime for Fram on 24th November, starting at 12:00.

During downtime we will update the firmware on interconnect infiniband switches

Sigma2 router upgrade in Tromsø

Uninett will conduct a firmware upgrade of one of the routers in Tromsø this Friday, 5th November between 11:00 and 15:00. This will not affect internal networks on Fram, NIRD or NIRD Toolkit or any production on the systems, but external network may briefly disconnect or stall

If the upgrade is successful, the other router will be upgraded next week.

[DONE] Fram Maintenance October 6 — 8.

Update, 2021-10-11 08:15: The maintenance is now finished, and the compute nodes are in production again. (There are still some nodes down, they will be fixed and returned to production. Also, the VNC service is not up yet. We are looking at it.)

Update, 2021-10-08 15:40: We have now opened the login nodes for users again. The work on the cooling system is taking longer than we hoped, so the compute nodes will not be available until Monday morning.

Udate: The maintenance stop has now started.

UPDATE OCTOBER 4TH:

Login and file system services will be available during Friday or earlier, but running jobs will not be possible until Monday morning

There will be a maintenance stop on Fram starting Wednesday October 6 at 12:00 and ending Friday 8 in the afternoon. All of Fram will be down and unavailable during that time. Jobs that would not finish before the maintenance starts will be left pending until after the maintenance.

The main reason for the maintenance is replacements of some parts of the cooling system. During the stop, the OS of compute and login nodes will be updated from CentOS 7.7 to 7.9, and Slurm will be upgraded to 20.11.8 (the same version as on Saga).

[Resolved] login-1.fram crashed – VNC unavailable

One of the login nodes on Fram unexpectedly this morning, causing some users to be disconnected from their sessions.

This also affects the VNC service on Fram. Any attempts on using this service will fail while we’re working on restoring the node.

Updates will be provided once we have more information to share.

Update 13:00 – The node, NIRD exports and VNC service is now back up and running and put back into production. Please let us know if you experience any issues.

We’re very sorry for any inconveniences this may cause.

FRAM – controller maintenance

Good morning,

we are going to perform some routine maintenance on one of the file system controllers of FRAM. This should have no significant implications for production, users might experience slightly degraded Lustre (file system) performance.

The operation is scheduled for today – 11 a.m. …

Update 8.07: There were also performance issues with the login nodes. This and the controller maintenance is now finished.

FRAM – Unexpected shutdown

We are experiencing some troubles with FRAM machine. Yesterday morning (Sunday 04.07.2021) there were many compute nodes that went unexpectedly down. We are investigating the issue.

Update 05.07.2021 – 10:54: The shutdown was caused by a power outage in the data center. We are taking all nodes up and monitoring their behavior.

Apologies for the inconvenience this may have caused!