We were affected by Copy Fail (CVE-2026-31431). We patched it for you. Here's the full story.

Jan S.

May 1, 20266 min read

On April 29, Theori disclosed CVE-2026-31431 — dubbed "Copy Fail." A 732-byte Python script. Deterministic root on essentially every Linux system shipped since 2017. No race conditions, no kernel offset guessing, no per-distro adjustments. Just root.

What Copy Fail actually does

algif_aead is a kernel module that exposes the crypto API to userspace via AF_ALG sockets. A 2017 optimization made AEAD operations work in-place, putting page cache pages into a writable scatterlist during crypto operations. Combined with splice(), an unprivileged process gets a controlled 4-byte write directly into the kernel's page cache — the in-memory copy of any readable file on the system.

Point those 4 bytes at /usr/bin/su. Run su. Root shell.

If you remember Dirty Pipe (CVE-2022-0847), Copy Fail is worse. Dirty Pipe required precise pipe buffer manipulation and version-specific targeting. Copy Fail is a straight-line logic flaw — it executes cleanly on every tested distribution without adjustment.

Three properties make it particularly dangerous for shared-host environments:

Deterministic. It works the first time, every time.
Cross-tenant. The page cache is shared host-wide. A write from inside one container reaches the host and every other container on that node.
Invisible. The corruption exists only in memory. Disk forensics show the original file untouched.

CERT-EU and Theori both named container platforms, CI runners, and multi-tenant Linux environments as priority targets. That includes us.

Yes, Zerops was affected

We're going to be straight with you: Zerops runs on shared-kernel Incus system containers. That's a deliberate architectural choice — it gives you bare-metal performance, a full Linux userspace, and the density that keeps your costs down. The tradeoff is that the kernel is shared. When there's a kernel bug, the host kernel is the thing that needs patching.

Copy Fail was exactly that kind of bug.

The difference between a managed platform and running your own infrastructure is what happens next.

23 hours from awareness to fully patched US-east

We became aware of the vulnerability at 23:00 CEST on April 29, within hours of public disclosure. By 08:30 CEST on April 30, emergency maintenance was live and the first nodes were rebooting with the patched kernel.

Here's why we could move that fast: Zerops runs on the Zabbly mainline kernel, which tracks the latest stable mainline Linux. The fix (commit a664bf3d603d) had landed in mainline on April 1 — nearly a month before the CVE went public. By the time the world learned about Copy Fail, the patched kernel was already built and waiting in our pipeline.

We rolled node by node, region by region. Our community got real-time updates at each stage:

us-east — fully patched by 12:30 CEST, April 30.
eu-central — fully patched by 18:30 CEST, May 1.

Partway through, we made a deliberate call to slow the rollout down. The initial plan was aggressive — patch everything as fast as possible. But a slower pace, with smaller waves and longer gaps between nodes, meant fewer non-HA services interrupted at any one time. We'd rather take longer and cause less disruption than race through and hit everyone at once.

HA services saw zero downtime. Traffic moved to other replicas while each node rebooted. Non-HA services experienced a brief interruption — typically around 30 seconds — and came back automatically.

Every node across all regions is now running the patched kernel.

What your day looked like on a VPS

If you were running your own server, here's how the week went:

Find out about the CVE. Hopefully not from an attacker.
Figure out whether your kernel is affected. (It was — every mainline kernel since 4.14.)
Find out whether your distro has shipped the patch yet. Some had. Some hadn't. Red Hat initially said they'd defer the fix, then reversed course days later — leaving anyone relying on RHEL-based kernels exposed in the meantime.
Schedule a maintenance window. Coordinate with your team. Warn your users.
Patch the kernel. Reboot the server. Pray nothing breaks.
Repeat for every server you run.
Hope you got to it before anyone else on a shared host exploited it.

If you have one server, that's an afternoon. If you have ten, it's a day. If you have fifty and they're running production traffic, it's a project.

And if you're on another managed platform — ask them the same questions. When did they patch? What was their exposure window? Could one tenant have written into another's page cache before the fix landed?

Tracking mainline matters

Many platforms and distros run older kernel branches and backport fixes. That means waiting for the vendor to pick up the patch, test it, and ship it. Debian, Ubuntu, and SUSE had patches out within a day. Others took longer. For anyone on a backport-based kernel, the window between "fix exists in mainline" and "fix is in your running kernel" is exposure.

Zerops tracks the Zabbly mainline kernel specifically to avoid that lag. When a fix lands in mainline, we have it. For Copy Fail, the patch was sitting in our kernel repository for nearly a month before the vulnerability was publicly disclosed. That's not luck — it's an infrastructure decision we made years ago, and weeks like this one are why.

What you should do

If you're on Zerops, you're already patched — no action needed on your end.

If you manage your own Linux servers:

Patch to mainline commit a664bf3d603d. Check your distro's kernel updates.
If you can't patch immediately, disable the module:
bash
```
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
rmmod algif_aead 2>/dev/null || true
```
Caveat: if your kernel has CONFIG_CRYPTO_USER_API_AEAD=y (built-in, not a module), these commands run without errors but do nothing — you need a kernel update. Check with grep CONFIG_CRYPTO_USER_API_AEAD /boot/config-$(uname -r).
For container platforms running untrusted workloads: block AF_ALG socket creation with seccomp, even after patching. Almost nothing legitimate uses it.

Why HA exists for exactly this reason

Most of the time, you don't think about kernel security. It doesn't affect your day-to-day. Then a week like this happens, and it's the only thing that matters.

HA services on Zerops rode through this entire emergency maintenance with zero disruption. Traffic shifted to other replicas while each node rebooted. Nodes came back. Traffic shifted back. No pages, no alerts, no apologies to your users.

Non-HA services came back quickly and automatically — but there was still a blip. If you've been thinking about enabling HA on a critical service, this is the kind of event that makes the case concretely: set your minimum containers to 2, and the next kernel emergency is something you read about the next morning instead of something that wakes you up.