Security incident, copy.fail

Major incident High Performance Computing Betzy Saga Storage Services NIRD Data Peak NIRD Data Lake NIRD Service Platform NIRD Research Data Archive easyDMP Web services Olivia
2026-04-30 11:30 CEST · 4 days, 20 hours, 57 minutes

Updates

Update

On 30 April 2026, Sigma2 handled a high-severity security incident related to the Linux kernel vulnerability CVE-2026-31431 (“Copy Fail”), which allows local privilege escalation from a regular user to root. Due to the shared, multi-tenant nature of Sigma2’s HPC systems, immediate mitigation actions were taken. These included temporarily disabling or restricting access to login nodes, draining and rebooting compute nodes, and applying mitigations across the environment. All user-facing nodes required verification and remediation. During this period, users experienced login disruptions, job requeueing, and reduced availability while nodes were progressively mitigated and returned to service. Importantly, no breaches, data compromise, or malicious activity have been reported or detected; the response was precautionary to eliminate the risk window associated with this vulnerability.

For Service Platform (SP) users, the incident stems from the same underlying kernel vulnerability affecting Linux systems (CVE-2026-31431), which can undermine isolation guarantees if left unpatched. While SP workloads are typically containerised and more isolated than traditional HPC jobs, they still depend on shared underlying compute and orchestration infrastructure. As a result, Sigma2 applied similar precautionary measures to ensure the integrity of SP environments, including mitigations and controlled restarts where needed. This has caused temporary service interruptions, reduced availability, or delayed workloads on the platform. However, consistent with the HPC side, no evidence of exploitation or impact to user data or services has been observed, and all actions were taken proactively to maintain system security and trust.

May 6, 2026 · 13:31 CEST
Resolved

Security incident fixed and resolved. Systems are back in full prod.

May 5, 2026 · 08:27 CEST
Update

Saga, Betzy and Olivia have been patched now and are back in production again.

April 30, 2026 · 19:42 CEST
Investigating

Due to the extent of this incident, we have also turned off Service Platform.

April 30, 2026 · 13:42 CEST
Issue

Security Incident [Updated]: Temporary Service Restrictions

We are currently handling a security vulnerability affecting Linux systems.
As a precaution, we have temporarily restricted access to some systems (including login nodes) while we apply security updates and perform necessary checks.
What we are doing:

  • Applying security patches to affected systems
  • Rebooting nodes to ensure the fix is fully in effect
  • Verifying system integrity before restoring access

What this means for you:

  • You may experience temporary login or service disruptions
  • Running jobs may be delayed or interrupted in some cases

What you should do:

  • No immediate action is required from most users
  • If you had active sessions, please reconnect after services are restored
  • As a general good practice, avoid reusing passwords and keep your credentials secure

At this time, there is no indication of misuse of user data, but we are proceeding with caution due to the nature of the vulnerability.
We will restore access as soon as systems are confirmed safe.

Thank you for your patience.

April 30, 2026 · 12:20 CEST

← Back