Downtime Betzy, 6th December 08:00 until 9th December 08:00

There will be a scheduled downtime for Betzy lasting three days starting on Monday 6th December at 08:00. Downtime will last until Thursday 9th, 08:00.

During the downtime we will conduct:

  • Full upgrade of the Lustre filesystem (both servers and clients)
  • Full upgrade of the infiniband firmware
  • Full upgrade of the Mellanox infiniband drivers
  • minor updates to other parts of the system (Slurm, configs, etc)

Please be aware that this does also affect the storage services recently moved from NIRD to Betzy.

We apologize for the inconvenience

[Updated] Batch system issue on Betzy

There is currently an issue on Betzy with the batch system which results in jobs not completing and new jobs not being started.

We are currently investigating the issue and will update once we know what caused it and how it can be resolved.

[Update 14:22]: Job submission is working again. The users experiencing this were unfortunately victims of a batch system restart which happened at the same time as the job was submitted.

Betzy: Network problems

[2021-06-23 14:26] The issue is now solved and the jobs have now started to run. Please report if you experience any further issues to support@metacenter.no

[2021-06-23 09:20] We are again experiencing problems on Betzy. We will update here when we’ve solved the issue.

[2021-06-22 11:15] The problem has been located and fixed, and Betzy should work as normal again.

[2021-06-22 09:30] We are currently experiencing network problems on Betzy. We don’t know the full extent of it, but it is at least affecting the queue system, so all Slurm-related commands are hanging.

We are investigating, and will update when we know more.

[SOLVED] Betzy Downtime 7th June 15:00-20:00

[UPDATE, 2021-06-08 08:00] Betzy is now up and in production again.

[UPDATE] Unfortunately, the downtime is taking longer than anticipated, and will not be finished tonight. We plan on getting Betzy up again at around 08:00 tomorrow morning.

Campusservice at NTNU will conduct maintenance on the High Voltage circuits for Non-redundant power on 7th of June 2021, between 15:00 and 20:00. All compute nodes and login nodes will be shut down during this time, and no jobs will be running during this period. Submitted jobs estimated to run into the downtime reservation will be held in queue.

[SOLVED] Problem with logins on Betzy

There are currently an issue with LDAP on Betzy, which means that logins will be rejected.

We’ve identified the cause and are working on resolving the problem.
This post will be updated when we have new information to share.

Sorry about the inconvenience!

Users that have logged in earlier can keep trying to log in, as it should eventually work.
Newly created user accounts unfortunately might not be able to log in before this issue is resolved.

Update 26.03, 12:15 – The problem has been solved now. It should now be possible to log in and run jobs as normal on Betzy.

Update 25.03, 13:45 – Vendor is working on the LDAP issue right now, regular login might be disrupted.

Update 19.03, 13:39 – We’re still looking into this with the vendor, which have escalated the issue. It has been identified that this also affects newly created user accounts on the system, which might not be able to log in at all.
Update 17.03, 16:25 – Unfortuntately the issue still exists. We have contacted the vendor to find a solution as soon as possible.
Update 17.03, 12:20 – No resolution on this just yet, though we have identified a potential cause for the problem and are working on getting a fix implemented.
Update 17.03, 09:51 – We’re seeing an increase in failed logins, though it appears to be a little inconsistent. If you’re experiencing this, trying again should work in most cases. We are investigating the cause of these issues.
Update 16.03, 10:26 – The problem is now solved and we’ll monitor the fix throughout the day.