OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoning

OpenMythos tackles a persistent issue in transformer scaling: how to deepen the model’s effective reasoning without stacking hundreds of unique layers. Instead of traditional depth, it recycles a subset of transformer layers through multiple forward passes in a looped recurrent block. The key architectural innovation is an input injection mechanism that keeps the input signal alive throughout the recurrence, preventing signal drift that usually plagues recurrent networks during long iterations.

what OpenMythos implements: a recurrent-depth transformer architecture

OpenMythos is an open-source theoretical implementation of a Recurrent-Depth Transformer (RDT), hypothesized to underpin Claude Mythos. The architecture breaks down into three stages:

Prelude: a few standard transformer layers that process the initial input.
Recurrent Block: a set of layers recycled over multiple iterations (looped), configurable from 16 up to 64 loops depending on model scale.
Coda: final transformer layers that output the model’s predictions.

This looped block approach means the model’s “depth” is effectively dynamic, allowing it to perform implicit chain-of-thought reasoning across recurrence steps in continuous latent space. The recurrence also supports systematic generalization and depth extrapolation through a three-stage grokking process during training.

Technically, the model supports switchable attention mechanisms, including Multi-Latent Attention (MLA) and Gated Query Attention (GQA). It also integrates sparse Mixture of Experts (MoE) layers with routed and shared experts to efficiently scale parameter counts.

The input injection formula is key:

h_{t+1} = A \cdot h_t + B \cdot e + \text{Transformer}(h_t, e)

Here, the hidden state at the next loop iteration combines a linear transformation of the previous hidden state (h_t), a direct injection of the original input embedding (e), and the transformer block’s output. This formula helps maintain signal fidelity across loop iterations, preventing the common issue of signal drift.

OpenMythos offers pre-configured model variants from 1 billion to 1 trillion parameters, scaling context windows from 4,000 tokens up to 1 million tokens. Loop iterations increase with model size, from 16 loops in smaller models to 64 in the largest 1T parameter variant. The training target is about 30 billion tokens, adjusted for the looped architecture to reflect effective training complexity.

the technical core: input injection and looped recurrence for sustained reasoning

What sets OpenMythos apart is the recurrent-depth design paired with the input injection mechanism. Most transformer scaling strategies increase depth by stacking unique layers, leading to linear increases in model size and training cost. OpenMythos instead recycles a smaller set of layers multiple times, effectively increasing “depth” without proportionally increasing parameters.

The tradeoff here is complexity in training dynamics. Without intervention, repeated looping risks signal degradation or drift, where the model’s internal state diverges from the original input information, impairing reasoning coherence.

The input injection mechanism addresses this by injecting the original input embedding at every loop step, combined linearly with the previous hidden state and the transformer’s output. This keeps the input signal alive and stable, enabling the model to perform implicit chain-of-thought reasoning in its latent space across loops.

Supporting switchable attention mechanisms like MLA and GQA adds flexibility in handling attention computation, potentially optimizing for different workloads or hardware.

Sparse Mixture of Experts layers extend the parameter efficiency by routing tokens dynamically to expert subnetworks, which helps scale the model to trillions of parameters while managing compute costs.

The architecture’s codebase is Python-based, presumably built with PyTorch given the AI model context, and includes configuration for Flash Attention to accelerate attention computations on CUDA-enabled hardware.

quick start

Installation is straightforward via pip:

pip install open-mythos

#uv pip install open-mythos

For users with CUDA and build tools who want to enable Flash Attention 2 in the GQAttention module:

pip install open-mythos[flash]

This installs the package and optional acceleration dependencies. Beyond installation, the repo’s README and documentation presumably include instructions for model configuration and training, but these are not detailed here.

verdict

OpenMythos is a technically interesting research platform exploring an alternative approach to transformer depth through recurrent looping and input injection. It offers a potential path to more efficient model scaling and systematic generalization with extremely long context windows.

However, this is not a plug-and-play library for production use. The complexity of recurrent transformer training, the need for specialized attention mechanisms, and sparse MoE integration suggest a steep learning curve and heavy infrastructure requirements.

It’s worth looking at if you’re involved in experimental transformer architectures, large-scale model research, or want to understand new approaches to deep reasoning in LLMs. For developers seeking stable, production-ready transformer implementations, more conventional architectures remain safer bets.

The code quality appears solid and well-structured for research, with an emphasis on configurability and extensibility. The input injection mechanism is a neat architectural solution to a known problem and worth understanding even if you don’t adopt the full approach.

PyTorch’s dynamic neural networks and tape-based autograd: a deep dive into flexible deep learning — Explore PyTorch’s unique tape-based autograd and dynamic neural networks architecture that enables flexible model develo
vLLM: Efficient large language model serving with paged attention and continuous batching — vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports qu
TensorFlow: a versatile platform powering machine learning from research to production — TensorFlow is a comprehensive open-source machine learning platform with stable multi-language APIs and broad hardware s
Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorch — Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, ac
DeepEP: Optimizing communication for large Mixture-of-Experts models with CUDA kernels — DeepEP is a CUDA-based communication library designed for Mixture-of-Experts models, delivering high-throughput GPU kern

→ GitHub Repo: kyegomez/OpenMythos ⭐ 11,590 · Python

Noureddine RAMDI / OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoning

what OpenMythos implements: a recurrent-depth transformer architecture

the technical core: input injection and looped recurrence for sustained reasoning

quick start

verdict

Related Articles