Queueing system on Vilje has crashed. We are working on a fix
Fram /cluster filesystem is working again
ram is experiencing a file system crash again. We are working to fix the problem.
17-06-2020 : Fram is operating normally
We are experiencing some issues with one of our file servers, and we also have some degraded performance on on of the interconnect links in the Infiniband fabric. This has led to some unstability last 24 hours, and may affect running jobs, and file system operations. We hope to fix the issue during next week.
All services on Fram and NIRD are now be back in production, except for slurmbrowser and desktop.fram.sigma2.no.
Here is a list of what has been done during the last four days:
- Firmware upgrade on NIRD in Trondheim and Tromsø
- Firmware upgrade on NIRD Toolkit
- Firmware upgrade on Fram storage and Fram nodes, switches m.m
- Software/OS upgrade on NIRD Trondheim and Tromsø
- Software/OS upgrade on NIRD Toolkit
- Software/OS upgrade on Fram nodes
In total, including vendors, ca 15 people were involved in the upgrade.
We thank you for your patience.
Due to underlying hardware issues, tos-project3 filesystem is set to READ-ONLY while we investigate the issue.
These are the projects affected:
login-2,saga.sigma2.no is currently down duee two several faulty memory dimms. This also affects the use and functionality slurm browser and desktop.saga.sigma2.no
We hope to have the dimms replaced some time during the week.
23 April – 18:50 NIRD and the NIRD toolkit services are now back into production
24th April: Fram is back in production.
WARNING: MAINTENANCE IS CURRENTLY ONGOING!
Dear NIRD, NIRD Toolkit, and Fram User,
We will have a four day long scheduled maintenance on NIRD, NIRD Toolkit and Fram starting on the 20th of April, 09:00 AM.
Running HPC jobs and logging in to Saga is NOT affected.
NIRD connectivity, and backup of files, from Saga IS affected
During the maintenance we will:
- carry out software and firmware updates on all systems
Files stored on NIRD will be unavailable during the time of the maintenance and therefore so will be the services. This will of course affect the NIRD file systems available on Fram and Saga too.
Login services to NIRD, NIRD-toolkit and Fram will be disabled during the maintenance
Please note that backups taken from the Fram and Saga HPC clusters will also be affected and will be unavailable during this period.
Please accept our apologies for the inconvenience this downtime is causing.
Vilje queueing system was unavailable from Sunday 5th 15:30 until monday 6th 08:30, due to a faulty infiniband cable.
We apologize for the inconvenience.
Vilje filesystem has been fixed with good help from DDN and we are now open for business.
Please be aware that some files may have been lost.
Always back up your files.