[Solved] Saga file system performance issue

We’re aware of ongoing issues with the file system performance on Saga and are investigating the cause. This also affects logging in to Saga, where the terminal will hang waiting for a prompt.

Updates will be provided in this post as soon as we have more information to share.

Sorry for the inconvenience.

Update 2021-07-15, 16:33: The issue was identified as a faulty connection between the storage server and the cluster. Performance should be back to normal, but we will monitor the system a bit more before declaring it healthy.
Update 2021-07-14, 15:00: We’ve discovered some faulty drives that are currently being swapped. We hope that the performance will improve once these are in production again.
Update 2021-07-13, 10:03: The file system is a bit more stable now, but we’re still looking into the cause for the degraded performance.

Service not activated in NIRD Service Platform

We regret to inform you that, due to a recent change made by Feide in response to the new national security directives in the sector, you might no longer be able to launch services on the NIRD Toolkit.  The reason is that, from now on, your institution shall approve the services requiring Feide login. If a service is not approved, you cannot access it with your Feide account. Unfortunately, the approval cannot be exercised when the services are deployed dynamically, and on-demand like in the NIRD Toolkit.  

What can I do? 

If you are experiencing a problem with using the NIRD Toolkit, we advise you to email the Feide administrator at your institution with us in CC (sigma2@uninett.no).  

If this takes time and you have an urgent need to use the NIRD Toolkit, there is a workaround (a little cumbersome but only temporary) to mitigate the problem, described here: 

Deploy a service through the NIRD Toolkit – Service not activated 

More information about the changes 
 

You can read more about the changes Feide has made in this article on www.feide.no (in Norwegian).  

We are currently working with Feide to resolve the issue. The solution shall allow automatic approval of all the services deployed through the NIRD Toolkit if the NIRD Toolkit service itself is approved. In the meantime, some organisations have already dealt with this problem by choosing the “Opt-in” option and therefore by approving all Feide Services. This is the temporary solution suggested by Feide and we will contact your organization’s Feide administrator to inform them about this option. 

Please note that Sigma2 was not notified of the changes, and therefore we could not inform you beforehand. Apologies for the inconvenience this may have caused!  

This post will be used to provide updates as we have more information available.

Apologies for the inconvenience this may have caused! 

Slow file system on Saga

We’re experiencing very slow file system on Saga at the moment and are working on identifying the cause.

Update 13:09: The file system is much more responsive now, but we’re still seeing that logins are hanging for ~30 seconds before getting access to the file system. This is being investigated further.

Updates will be provided once we have more information.

Sorry about the inconvenience.

Documentation pages unavailable

Our documentation is currently unavailable due to a larger outage with an upstream provider that our solution is using.

We’re sorry about the inconvenience. Please do not hesitate to contact support if you have any questions.

Update 12:58: The provider have implemented a fix and is currently monitoring the changes. Our documentation is back and available again.

[SOLVED] Problem with logins on Betzy

There are currently an issue with LDAP on Betzy, which means that logins will be rejected.

We’ve identified the cause and are working on resolving the problem.
This post will be updated when we have new information to share.

Sorry about the inconvenience!

Users that have logged in earlier can keep trying to log in, as it should eventually work.
Newly created user accounts unfortunately might not be able to log in before this issue is resolved.

Update 26.03, 12:15 – The problem has been solved now. It should now be possible to log in and run jobs as normal on Betzy.

Update 25.03, 13:45 – Vendor is working on the LDAP issue right now, regular login might be disrupted.

Update 19.03, 13:39 – We’re still looking into this with the vendor, which have escalated the issue. It has been identified that this also affects newly created user accounts on the system, which might not be able to log in at all.
Update 17.03, 16:25 – Unfortuntately the issue still exists. We have contacted the vendor to find a solution as soon as possible.
Update 17.03, 12:20 – No resolution on this just yet, though we have identified a potential cause for the problem and are working on getting a fix implemented.
Update 17.03, 09:51 – We’re seeing an increase in failed logins, though it appears to be a little inconsistent. If you’re experiencing this, trying again should work in most cases. We are investigating the cause of these issues.
Update 16.03, 10:26 – The problem is now solved and we’ll monitor the fix throughout the day.