Dear Fram User,
Some of you might have experienced sporadic I/O hangs on Fram in the past period.
In many cases the I/O hangs were caused by overloading the RPC queue on the NFS mounted /nird/home file system. This had negative performance impact on the compute nodes, in some cases lead to job crashes.
Therefore we have decided to migrate all Fram user’s $HOME directory from /nird/home/$USER to /cluster/home/$USER, starting with the next upcoming scheduled maintenance. Preparations has been made and some accounts were already synchronized over during past few weeks.
Since today we suddenly lost a big amount of disks on NIRD, to avoid data loss, we have decided to stop all user I/O on NIRD and migrate the remaining user accounts over to Fram.
Starting from today – 2018-11-07 – /nird/home is unmounted from Fram, but will still be available on NIRD. Until next upcoming maintenance we have created a symbolic link from /nird/home to /cluster/home so that eventual scripts can be adjusted.
As soon as NIRD disk issues are remediated, nightly backups will be taken from Fram to /nird/home/$USER/backup/fram.
This step made Fram less dependent on NIRD, thus from this point on, we will be able to schedule maintenance on NIRD, without having impact on running jobs.
Thank you for your understandings!