- 2019-10-18 14:36 We are ready with the reinstallation, configuration checks, QA and tests. Access to the machine has been reopened and queued jobs are running again.
- 2019-10-18 06:12 Reinstallation of compute nodes is much slower then anticipated and thus re-opening of the machine is delayed. We do our best to finish the maintenance as soon as possible. In parallel we are conducting tests and benchmarks.
Will keep you updated.
- 2019-10-17 08:25 File system servers and infrastructure switches were patched yesterday.
We are proceeding now with the upgrade of the service and the login nodes.
- 2019-10-16 08:07 Maintenance has started.
Dear Fram User,
We will have a two days planned downtime starting from 08:00AM on the 16th of October for maintenance on the storage and the file system.
During this time we will, together with the vendor, upgrade the storage firmwares, upgrade the software on the /cluster file system servers and upgrade the operating system on Fram.
This upgrade is necessary to fix the frequent issues with the metadata servers and enhance stability and security of the system.
Fram jobs which can not finish by the 16th of October, are queued up and will not start until the maintenance is finished.
Thank you for your consideration!
The node mentioned above has to be rebooted due to its unresponsiveness. We are sorry for any inconvenience.
login-1-1 node hanged and had to be rebooted. Up and running again now. Have a nice weekend!
Fram login-1-2 is rebooted around 15:10 today due to the lustre filesystem glitch.
We have the pleasure to announce that the Saga HPC cluster is now opened for production for existing pilot users.
Candidate projects for migration from the Abel cluster will be contacted directly.
Link to the Saga documentation page.
With two nodes short of a fully 1404 node capacity, we are now back in full production. A few jobs were lost due to missing lustre filesystem on a few nodes, wich again was due to a faulty interconnect/infiniband cable.
Thank you for your patience.
We are currently experiencing a network error on VIlje, causing around 100 nodes to be unavailable until further notice. Some jobs may be lost.
We apologize for the inconvenience.
- 2019-08-26 12:45: NIRD project areas are mounted on Fram login nodes.
- 2019-08-25 14:15:Service Platform is up now, you can login now to NIRD and access your files.
NIRD project areas will be reconnected to Fram tomorrow.
- 2019-08-23 18:42: Vendor started a forced health check on the system which is taking more time then expected. We will re-open access to NIRD and Service Platform as soon as checks and rebuilds are finished.
- 2019-08-23 08:05: Storage vendor has finished the hardware replacements and installation of new firmware on the storage system.
We are currently monitoring the storage system together with the vendor.
Dear NIRD and Service Platform users,
We have a planned downtime on the 22nd of August, to replace some defective hardware. Systems will be taken offline starting from 08:00AM.
Engineer from storage vendor will assist us from the very first hour.
We expect the maintenance to finish in one and a half day.
NIRD projects will still be accessible during the maintenance
from login-trd.nird.sigma2.no but in read-only mode.
Will keep you updated here.
Sorry for the short notice.
There is a power outage in Tromsø affecting parts of campus right now. Stallo is down, Fram is still up.
Dear Fram cluster users:
We have a problem with the cooling system in the Fram machine room,
due to this, we have to reduce the load on the cluster by reserving the entire cluster, which means no job will run.
We are sorry for the inconvenience, and we will keep you updated.
Update 2019.07.11 08:00: Fram should be fully operational again, we are monitoring the machine and releasing compute nodes back to production.
Update 14:55: Some of the nodes are crashed, which means it’s possible that some of the jobs get killed
Update 2019.07.03 10:55: To keep the machine room temperature reasonably low with only one working CDU, we have kept 495 nodes in maintenance state while 197 nodes are in downstate, we will monitor the power consumption in the machine room and release more nodes accordingly.
Update 2019.07.08 11:55: Fram is expected to be back to full its full capacity on Wednesday, 2019.07.10.