Aegisimmortal
ArticlesCategories
Programming

Navigating API Violations and Hyrum's Law: A Kernel Developer's Guide to Restartable Sequences and TCMalloc

Published 2026-05-03 09:49:09 · Programming

Introduction

Hyrum's Law warns that any observable behavior of a system, no matter how incidental, will eventually become a dependency. The Linux kernel community recently experienced this firsthand when changes to restartable sequences in the 6.19 release—though fully compliant with the documented API—caused Google's TCMalloc library to break. TCMalloc had been relying on undocumented behaviors, preventing other code from using restartable features. The kernel's strict no-regressions rule compelled developers to find a way to accommodate TCMalloc's violations. This guide walks you through the process of handling such conflicts, using the restartable sequences and TCMalloc case as a central example.

Navigating API Violations and Hyrum's Law: A Kernel Developer's Guide to Restartable Sequences and TCMalloc

What You Need

  • Knowledge of kernel development: Familiarity with the Linux kernel source tree, patch submission, and review processes.
  • Understanding of restartable sequences: Know how rseq works as a mechanism for per-CPU operations without atomic instructions.
  • Hyrum's Law awareness: Recognize that all observable behaviors—even bugs—can become de facto APIs.
  • Familiarity with TCMalloc: Understand that this memory allocator uses restartable sequences and may deviate from documented specifications.
  • Access to kernel 6.19 source: For testing and applying changes.
  • Testing environment: A system where you can run workloads that exercise restartable sequences and memory allocation.

Step-by-Step Guide

Step 1: Document the Official API

Begin by reviewing the restartable sequences API as defined in the kernel documentation (e.g., Documentation/admin-guide/rseq.rst). Identify the exact set of fields, flags, and behaviors that the kernel guarantees. This becomes your baseline—any changes must preserve these guarantees under the no-regressions rule. In the 6.19 case, the kernel team maintained this documented contract, yet still encountered breakage.

Step 2: Identify Unintended Dependencies

Apply Hyrum's Law by scanning for code that may rely on incidental or undocumented behaviors. For restartable sequences, this includes checking userspace libraries like TCMalloc. Use tools like git grep and code review to find usage patterns that extend beyond the official spec. TCMalloc, for instance, depended on the ability to manipulate rseq data structures outside the defined flags, which was not part of the documented API.

Step 3: Assess the Impact of Changes

When you propose a kernel change, evaluate how it affects known dependent libraries. Run regression tests with TCMalloc and other consumers. In the 6.19 scenario, even though the API remained intact, the change altered internal memory layout, causing TCMalloc to misinterpret rseq state. This broke TCMalloc and simultaneously prevented other code from using restartable features because TCMalloc's non-compliant behavior hogged the rseq mechanism.

Step 4: Apply the No-Regressions Rule

The kernel's no-regressions rule states that new changes must not break existing userspace programs. When a regression is detected (as with TCMalloc), you have two paths:

  • Revert the change (often undesirable because it may be needed for improvements).
  • Find a compromise that accommodates the unintended dependency.

In our case, the kernel developers chose to keep the change but add compatibility code to handle TCMalloc's violations.

Step 5: Design Workarounds for Non-Compliant Users

Create workarounds that detect and emulate the old undocumented behavior. This may involve adding new flags or fallback paths. For TCMalloc, the workaround involved detecting when a process was using rseq in a non-standard way and reverting to the old scheduling of rseq state. Ensure the workaround is transparent to other users and does not weaken the intended improvements. Test that both TCMalloc and other code benefit.

Step 6: Validate Compliance and Performance

After implementing the workaround, run thorough tests:

  • Verify that TCMalloc works correctly with the 6.19 kernel.
  • Confirm that other applications using restartable sequences (e.g., glibc, libc++) function without issues.
  • Measure performance to ensure the workaround doesn't degrade the original improvements.

Use the kernel's test suite and userspace test programs. Monitor for any new regressions.

Step 7: Document the Experience

Add comments and documentation explaining why the workaround exists. Note that TCMalloc's behavior violated the documented API and that Hyrum's Law was in play. This helps future developers understand the design trade-offs and avoid repeating the same conflict. The kernel community often posts detailed commit messages and LKML discussions—contribute to that knowledge base.

Tips for Success

  • Expect the unexpected: Always assume that userspace may rely on kernel internals you consider private. Hyrum's Law is inevitable.
  • Maintain clear documentation: The more precise your API specification, the easier it is to argue when a dependency is invalid.
  • Communicate with userspace maintainers: In the TCMalloc case, early dialogue with Google could have surfaced the violation sooner.
  • Use feature detection: Provide mechanisms for userspace to query supported behaviors, so libraries can adapt.
  • Test widely: Include unusual setups and third-party libraries in your regression test matrix.
  • Be pragmatic: The no-regressions rule is strict, but sometimes the cleanest fix is to accommodate the violator gracefully while upgrading the API.