Noureddine RAMDI / MicroGPT-C: Coordinating tiny GPT-2 models in C for edge logical reasoning

Created Mon, 04 May 2026 10:23:02 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

enjector/microgpt-c

MicroGPT-C breaks the mold of large monolithic language models by orchestrating multiple tiny GPT-2 style models in C to solve focused logic tasks with impressive accuracy. Instead of scaling up model size, it relies on a deterministic coordination pattern called the Organelle Pipeline Architecture (OPA) that lets small specialist models collaborate and outperform bigger single models on games like Sudoku and Mastermind.

Organelle pipeline architecture for edge GPT-2 inference

At its core, MicroGPT-C is a zero-dependency C99 implementation of a GPT-2 engine designed for running on edge devices. The repo implements multiple tiny transformer models ranging from 30K to 460K parameters, each trained on specific subtasks. These models, called organelles, are orchestrated by deterministic C scaffolding — the Organelle Pipeline Architecture (OPA).

This architecture coordinates the outputs of the individual organelle models to collectively solve logical reasoning games. Typical tasks include Pentago, 8-Puzzle, Connect-4, Mastermind, and Sudoku, where the ensemble achieves win or solve rates between 78% and 91%. The approach demonstrates that composition of smaller, focused models can outperform a single large model.

Under the hood, MicroGPT-C uses several technical innovations:

  • Memory Sparse Attention (MSA): This allows handling infinite sequence lengths without the quadratic memory blowup typical of transformers.

  • TurboQuant 4-bit compression: Reduces memory footprint by 8x and speeds up generation by 25%, critical for edge deployment.

  • Prefix KV cache sharing: Speeds up ensemble inference by 1.9–5.7× by sharing key-value caches between models.

The entire engine runs single-threaded on an Apple M2 Max, with models ranging from tiny 360KB to 5.4MB in size. Despite the modest hardware, it achieves 3.7–5.8 million operations per second, illustrating the efficiency of the C implementation.

Why MicroGPT-C stands out: deterministic coordination and tradeoffs

What distinguishes MicroGPT-C is the Organelle Pipeline Architecture that orchestrates multiple tiny models rather than scaling model size. Each organelle model individually performs about 50% accuracy but combined deterministically achieves up to 91% on Pentago and 90% on 8-Puzzle.

This approach challenges the common assumption that bigger models are always better. Instead, MicroGPT-C shows the power of structured collaboration, where the whole is greater than the sum of parts. The deterministic scaffolding in C acts as a conductor, guiding the organelles to work together efficiently.

The codebase is surprisingly clean for a complex system juggling multiple models and attention mechanisms. It has zero external dependencies, relying only on C99 and CMake, which makes it portable and easy to build on various platforms.

The tradeoff is clear: this architecture excels in focused logical domains but faces limits in continuous or high-dimensional reasoning, documented as the “Discretisation Wall.” The repo also includes a lottery negative control experiment proving the engine truly learns patterns rather than exploiting artifacts.

Benchmarks show notable performance metrics:

  • 91% win rate on Pentago using two organelles (92K params each)
  • 90% solve rate on 8-Puzzle with five organelles (460K params each)
  • 78% solve rate on Sudoku with two organelles (160K params each)
  • Character-level throughput: 28K tokens/sec training, 16K tokens/sec inference (841K params)
  • TurboQuant achieves 8× memory reduction with a 25% speedup in generation
  • Prefix KV Cache yields up to 5.7× ensemble speedup

Compared to Andrej Karpathy’s microgpt.py, MicroGPT-C trains ~1,000× faster and performs inference ~700× faster, a testament to its optimized C implementation and architectural design.

Quick start with MicroGPT-C

git clone https://github.com/enjector/microgpt-c.git
cd microgpt-c
mkdir build && cd build
cmake ..
cmake --build . --config Release

## Requirements

- **C99 compiler** (GCC, Clang, MSVC)
- **CMake 3.10+**
- No other dependencies

Optional: Git LFS for pretrained checkpoints (`git lfs pull`).

Building the project is straightforward with CMake and a modern C99 compiler. Once built, you can experiment with the pretrained organelle models for various logic games as described in the documentation.

verdict: a niche but insightful approach to efficient edge GPT

MicroGPT-C is a compelling demonstration that multiple small, specialized GPT-2 models can be orchestrated to outperform single large models on certain logical tasks. Its zero-dependency C implementation and memory optimizations make it well suited for edge deployments where resources are limited.

That said, this approach is niche: it shines in discrete logical reasoning games but runs into fundamental limits when applied to continuous or more general NLP tasks. The “Discretisation Wall” is a candid admission of this boundary.

For practitioners interested in efficient transformer implementations, edge AI, or novel model composition patterns, MicroGPT-C is worth exploring. The codebase is clean, well-documented, and offers concrete benchmarks that back its claims.

It’s not a replacement for large-scale LLMs but a complementary architecture that proves composition beats capacity in focused domains. Worth understanding even if you don’t adopt it directly.


→ GitHub Repo: enjector/microgpt-c ⭐ 102 · C