[UPDATE 3/11 09:15] All login nodes are now online for a day or longer, hence we consider the problem solved.
[UPDATE 2/11 09:00] At the moment three of five login nodes are working. We are working on the other two as well and inform here when there is an update.
We are currently having troubles with the login nodes on Saga. Some login nodes are down or do not respond, hence logging into Saga is currently nearly impossible. We are investigating the reason for this. We will provide an update when the problem is fixed or we know what causes this.
We are sorry for the inconvenience this causes
Update 2021-04-28: Settings on login-1 & login-2 were adjusted to prevent data corruption. These new settings have not resulted in data corruption while testing with regular I/O operations (cp, scp, rsync). Nevertheless, data corruption may still happen. Therefore users are cautioned to verify that data is not corrupted after it has been copied or transferred. In case you detect data corruption, please, do not hesitate to inform support even if you have incomplete information about the incident.
The login node login-2 is reopened for use.
We are still evaluating whether adjusting settings on compute nodes is necessary because so far we were not able to reproduce data corruption there.
We labelled this as a temporary fix because additional measures may be necessary.
Original post: On login-2 on Betzy, there are again cases of data corruption (since this morning). Even simply copying a file within the parallel filesystem (/cluster/…) resulted in data corruption in some cases (in about 5-10% of the cases, each copying a 10 GiB file). We are investigating the issue and are in contact with the vendor. Until we find a permanent solution or workable temporary fix, users are kindly asked to not use login-2.
We are sorry for the inconvenience
Update, 12:00 2020-09-29: The login node is up and running and available again.
We need to reboot login-1-2.FRAM. We plan to do that at 10:00am today, September 29th. Please, stop all running programs and logout by then. Sorry for the inconvenience.