2018-11-12:11:55: Login node and services are back into production.
2018-11-12 10:20: Disk pool raid sets were rebuilt until Saturday, but a set of drives failed once again. A new rebuild was ongoing and we had to reset IO card and power cycle the storage today. At this point all is up and functional on the storage side and file system is up. We are currently switching back geo-replication and expect to reopen access around 12:00 PM today. Will keep you posted.
2018-11-09 13:59: The firmware is now applied without any problem. However we still need to wait for a rebuild to finish. The time estimate for the rebuild is 12 hours left. We will open the system for regular use as soon as we can.
2018-11-09 12:45: Most of the rebuilds are ready and we are currently patching the firmware on the disk enclosures. If all goes well, we expect to have NIRD up and functional during the day today. Will keep you updated.
2018-11-08 13:27: The firmware update is running. We have to wait for rebuild of broken drives before we can upgrade the enclosures and finnish up the emergency maintenance. We don’t expect the rebuild to be finished before tomorrow (friday november 9th). Hence the system in whole will not be available before tomorrow.
We are very sorry any inconvenience this may cause.
Dear Fram User,
Some of you might have experienced sporadic I/O hangs on Fram in the past period.
In many cases the I/O hangs were caused by overloading the RPC queue on the NFS mounted /nird/home file system. This had negative performance impact on the compute nodes, in some cases lead to job crashes.
Therefore we have decided to migrate all Fram user’s $HOME directory from /nird/home/$USER to /cluster/home/$USER, starting with the next upcoming scheduled maintenance. Preparations has been made and some accounts were already synchronized over during past few weeks.
Since today we suddenly lost a big amount of disks on NIRD, to avoid data loss, we have decided to stop all user I/O on NIRD and migrate the remaining user accounts over to Fram.
Starting from today – 2018-11-07 – /nird/home is unmounted from Fram, but will still be available on NIRD. Until next upcoming maintenance we have created a symbolic link from /nird/home to /cluster/home so that eventual scripts can be adjusted.
As soon as NIRD disk issues are remediated, nightly backups will be taken from Fram to /nird/home/$USER/backup/fram.
This step made Fram less dependent on NIRD, thus from this point on, we will be able to schedule maintenance on NIRD, without having impact on running jobs.
Thank you for your understandings!
- 2018-11-07 16:36: We will have to upgrade firmware on all the storage enclosures in NIRD and rebuild the failed volumes. Will keep you updated and reopen access to NIRD and Service Platform as soon as emergency maintenance is ready.
- 2018-11-07 14:53: User home directories were migrated over to /cluster/home and Fram is starting back again. We will soon re-open access to Fram. Please note that NIRD project areas will _not_ be available until NIRD is up again.
Due to disk failures on NIRD, we have to shut down Fram, NIRD and the Service Platform immediately to avoid losing user data. This means stopping all jobs and user processes, and logging users out of the systems.
We will try to copy the home directories from NIRD to Fram to be able to start up Fram again without needing to mount NIRD. If this is successful, we will be able to start up Fram again, hopefully later today. (Note that the NIRD project areas will _not_ be available until NIRD is up again.)
We will update this post with more information when we know more.
Please use version 5.4.4-intel-2018a instead.
Due to unforseen issues with the hardware yesterday the maintenance will continue today.
So more short downtimes of the services are to be expected.
Sorry for the inconvenience.
19-10-2018: maintenance is over since yesterday evening.
Thank you for your patience.
We are currently experiencing a file system issue on the service platform affecting all the services. We are working on it.
The NIRD service platform will undergo a maintenance on Wednesday 17 October between 9 am and 5 pm.
Short downtime of the services running on the platform might be expected during that day.
Sorry for the inconvenience.
VASP 5.4.4, module named VASP/5.4.4-intel-2018a is now updated with transition state tools (vTST), implicit solvation model (VASPsol) and occupation matrix control, all with unmodified, abfix and noshear modications in code. Binary names should be self-explanatory, please look in bin for all versions.
Gaussian 16, minor release B.01 is now installed on Fram.
We have put in place a second NIRD login node.
This node is accessible at
Report problems to firstname.lastname@example.org.