We are experiencing problems with some jobs on betzy either not starting, being killed or falling into a zombie state, where the calculations fails but the jobs continues until time runs out.
This might have been been connected to a reboot of one of Betzy IO servers. We are investigating and will update this message once we know more.
[Update, 2022-05-30 15:20] Our downtime is over, Saga is open, and jobs are running. Please let us know if encounter any problem.
[Update, 2022-05-30 09:00] Maintenance stop has now started.
There will be a maintenance break on Saga to apply patches to the files system. Jobs will not run on the 30.5.22 from 8:00 until some time after lunch. Jobs can be submitted and if short enough will run before the break. Otherwise the will automatically start once the maintenance break is finished.
Update 01.04.22: The FS performance seems to be normal again.
We experience long login times on the Saga login nodes and poor data access and transfer speed on the whole cluster. We are working on finding the reason and fixing it.