Hugemem-1-1 and hugemem-1-2 will be unavailable from Thursday 2nd February.
These nodes will be moved to Saga. All projects using hugemem nodes on Fram are encouraged to ask for access to Saga machine.
Nodes will be available some time during February on Saga.
Dear Betzy user: The filesystem is yet again almost full. This causes substantial performance degradation.
This is how you can help:
Please have a look on your own or your project files on Betzy and delete all files that are not strictly needed.
Dear Fram user. We need to push the maintenance on the FRAM cooling system until the second of February due to some unexpected internal complications from the service vendor. We apologize for the inconvenience this may cause you.
While the maintenance is ongoing there will be reduced capacity on available compute nodes.
During the next few days we will test and possibly change some configuration settings for infiniband on Betzy. It should not affect jobs, but some computenodes might become unavailable for short periods of time.
Dear Betzy user. We will update firmware on some of the Betzy storage servers Wednesday (2022-12-14) morning between 08:00 and 10:00. Jobs should not be affected, but you might experience slow access to files a few minutes at a time.
We apologize for the inconvenience.
Parts of the file systems of NIRD-TRD are unavailable. We are working with vendor for a solution. It might be necessary to put the whole system offline on short notice.
Update 2022-10-17, 14:47: Unfortunately we have to turn off most of the NIRD Trondheim cluster at this time. The system might be unavailable until tomorrow, but we will keep you informed about the availability.
This affects the following services:
– Data storage for projects with primary site in Trondheim
– Services running on the service platform in Trondheim
– NIRD mounts on Saga and Betzy
Update 2022-10-17, 17:30: NIRD-TRD and the attached services are back into production.
We are going to conduct a file system check on Fram file system. This might lead to degraded performance while the scan is ongoing.
One of the Betzy file system disk controllers has an issue that need to be rectified. This will cause lower performance on file system for a couple of hours today starting at 13:00
The postponed Fram cooling maintenance will take place on Wednesday 27th starting at 07:00 until Thursday 28th 20:00, but depending on amount of work and diagnostics needed might be shorter or longer.
The maintenance will be conducted to diagnose and mitigate recent cooling issues and related crashes we have seen on Fram. We hope to have a better view of the situation and an action plan for making the whole cluster available for production. Currently we are at 80% capacity and will continue at that level to keep the system stable.
Dear Fram user. We are in the process of diagnosing and fixing several issues with Fram supercomputer. We have identified at least two hardware related issues to cooling equipment and a faulty CPU in a non-redundant server (queueing server). We have mitigated the issues temporarily and we try to slowly bring the system back in a stable configuration.
Fram will run in 50% capacity until Tuesday 10:00 and then with ca 75 % capacity until Wednesday 10:00 when we will attempt to run at 100%.
A maintenance window is still needed to fix both the faulty CPU and the issues with cooling capacity. We will update these pages when we know more about the maintenance.
UPDATE: Fram availability is now at 80%