2019-05-20: The capacity of job throughput on Fram is reduced in the period may 20th 14:00 – may 21 16:00 due to service work on the water cooling system of Fram.
Dear Fram users,
The Fram filesystem, and most critically /cluster/work and /cluster/home, is running out of inodes, and there is only 8% left. If we run out of inodes it will not be possible to create new files. To avoid loss of data and job crashes we kindly ask all of you to if possible delete files that you no longer need.
Jon, on behalf of the Operations Team
2019-04-17: 9.30 Due to the on-going restoration of a large fileset on NIRD users may experience a significant slowness. Expected duration for the restoration is 5 days. Sorry for the inconvenience.
2019-04-15:10.30 Due to the on-going restoration of NS2345K on NIRD the FRAM mountpoint is now located at /tos-project1/NS2345Ktmp/FRAM.
2019-04-11 10:30 Update, stallo is now back in service again.
2019-04-02: 16:35 Due to work on th electrical infrastructure in the building housing stallo we need to shutdown stallo in the period April 9-11. No new jobs are allowed to start that is not finished by april 9th 08:00.
login-1-1 on Fram become unresponsive and we had to reboot the node.
The login node should be shortly operational again.
Missing mounts for Project NS2345K on NIRD
The NFS server which is exporting /tos-project1/NS2345K/FRAM to NIRD/FRAM has crashed yesterday around 16:00. We have recovered the NFS server and /tos-project1/NS2345K/FRAM is re-exported agian. We are currently working on mounting the filesystem in login containers and meanwhile investigating the cause. We are sorry for the inconveniences caused.
- 2019-04-03 13:25: NIRD and the service platform are back into production.
- 2019-04-03 10:59: Maintenance work has finished. We are proceeding in starting back the filesystems and services.
- 2019-04-03 08:22: Disk expansion and rebalancing is finished. HW checks are currently ongoing and shall finish in a couple of hours. Will keep you posted.
- 2019-04-02 09:55: NIRD filesystems are unmounted from Fram and replicated data is available read-only trough login-trd.nird.sigma2.no
- 2019-04-02 08:06: Maintenance work has started.
Dear NIRD User,
NIRD and the Service Platform will be under maintenance to expand the disk capacity in Tromsø.
The operations for storage expansion and disk pool rebalancing will start on the 2nd of April at 8:00 am CET and will last for maximum 2 days. During the maintenance, the services running on the NIRD Service Platform and on the NIRD Toolkit will not be available.
During the downtime we plan to make project data mirrored to Trondheim available in read-only mode trough a specially built login node. This solution will be first tested with real load during this downtime, thus we might encounter some technical difficulties.
That being said, to access the remote, mirrored data, please login to login-trd.nird.sigma2.no.
We apologise for the inconvenience.
2019-03-19 17:55 Secondary MDS server for lustre filesystem crashed between 17:15 and 17:45, And primary MDS server took over and restored filesystem around 17:45 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.
Main MDS server for lustre filesystem crashed between 14:00 and 14:30, And secondary MDS server took over and restored filesystem around 14:40 . Some of the jobs running on Fram might be affected. We are investigating the root cause of the incident.