CHERI memory safety mitigates LLM-discovered vulnerability in FreeBSD

Brooks Davis – Capabilities Limited

Over the past three weeks we’ve seen a number posts about the future of using LLMs for bug discovery and exploitation (https://www.forbes.com/sites/jonmarkman/2026/04/08/what-is-claude-mythos-and-why-anthropic-wont-let-anyone-use-it/, https://www.forbes.com/sites/amirhusain/2026/04/01/ai-just-hacked-one-of-the-worlds-most-secure-operating-systems/, https://blog.calif.io/p/mad-bugs-claude-wrote-a-full-freebsd). This has led to significant excitement in the popular press. It’s indisputable that new bugs are being found and rapidly exploited, but unclear if we’re seeing a radical shift in bug discovery or even vulnerability classes, or if this is yet another bug finding technique that lets us search new niches of the bug space not well covered by prior techniques such as source-code analysis tools such as Coverity, or fuzzing combined with sanitizers like ASAN and UBSAN. Only time and analysis of a larger corpus of bugs will tell.

From a CHERI perspective, one of the most interesting bugs is CVE-2026-4747 (https://www.freebsd.org/security/advisories/FreeBSD-SA-26:08.rpcsec_gss.asc) because the code in question exists in our CHERI-enabled CheriBSD operating system – so we can easily exercise it. A security researcher in the CHERI group at the University of Cambridge has analysed publicly available exploits and produced a proof of concept (POC) to perform the first step of the attack (a classic stack buffer overflow). On CHERI, strong C/C++ memory safety deterministically traps on CHERI with a bounds violation when the attacker tries to trigger the stack buffer overflow before memory corruption can occur.

CHERI’s bounds fault converts a remote code execution (RCE) vulnerability into a denial of service vulnerability, downgrading the impact from Critical to High. An unexpected reboot is strongly preferable to total attacker control of a system. It is further reasonable to speculate that appropriate compartmentalization or failure-oblivious computing strategies might allow the RPCSEC_GSS to fail rather than forcing the kernel to exit as it does today. This is an area of active research for the CHERI team.

The bad news is that we’re likely to see more of this and that tools will be able to reach deeper into old code over time. The good news is that CHERI trivially blocks this attack and likely many others. As LLM-driven discovery accelerates, the case for memory safety by design becomes stronger.