X’s recommendation engine, open sourced as “the-algorithm”, powers the For You Timeline and Recommended Notifications that millions see daily. What’s striking is the multi-stage pipeline that blends graph-based candidate sourcing, neural ranking models, and visibility filters, orchestrated by a modular software framework. This repo offers a rare look under the hood of a production-scale social feed recommender built with Scala and Rust.
What the-algorithm does and how it’s built
At its core, this repo implements the recommendation algorithm behind X’s For You Timeline and Recommended Notifications — the personalized streams of tweets and content tailored to each user. It’s built atop a shared infrastructure of data services, machine learning models, and feed construction frameworks.
The stack is primarily Scala, with Rust components used for ML model serving (notably the “navi” framework). Bazel is the build system, although the repo lacks top-level BUILD or WORKSPACE files, so fully building locally is non-trivial.
The architecture decomposes into three main stages:
Candidate sourcing: This is where potential posts to recommend are gathered. Roughly half of the candidates come from a search-index service that queries a large tweet corpus. Additional candidates are sourced via UTEG graph traversals, which explore user-tweet interaction graphs, and a follow-recommendation-service that suggests accounts and their tweets.
Ranking: After candidates are sourced, they go through a two-tier ranking process. A light-ranker applies a quick filter, then the heavy-ranker — a neural network model — scores candidates more precisely. These ML models include community detection via SimClusters and knowledge graph embeddings through TwHIN, indicating a sophisticated use of graph neural networks.
Filtering: Finally, visibility filters apply rules for compliance and trust, removing content flagged by policy or that might degrade user trust.
The entire pipeline is orchestrated by the product-mixer framework, which manages feed construction and combines the signals from multiple candidate sources and rankers into a coherent personalized feed.
What stands out technically and the tradeoffs involved
The most interesting technical strength is the layered approach to candidate generation and ranking. Candidate sourcing is diverse — combining search-index hits (~50% of posts), graph traversals, and social graph signals — which helps balance relevance and freshness.
Using graph neural network models like SimClusters for community detection and TwHIN for knowledge graph embeddings adds depth to the feature extraction. These models capture user and content relationships beyond simple interactions, which is crucial in social media recommendations.
The heavy-ranker neural network is a computationally expensive but precise scorer that sits at the end of the candidate pipeline, ensuring only the most relevant content surfaces. This tiered ranking balances latency and quality — a light-ranker quickly weeds out obviously irrelevant content, reducing expensive heavy-ranker calls.
The presence of Rust-based ML serving (“navi”) alongside Scala services is a practical polyglot approach: Scala handles backend business logic and orchestration, while Rust provides efficient, low-latency ML model serving.
The tradeoffs are clear: this architecture is complex, requiring coordination across multiple services and languages, which can complicate local development and testing. The lack of top-level BUILD files in the repo hints at partial open sourcing, likely to avoid exposing proprietary build infrastructure.
The codebase appears well-structured around domain-specific services (tweetypie for tweet data, user-signal-service for user interactions), but the multi-language setup and heavy reliance on ML models mean newcomers face a steep learning curve.
Explore the project
The repo is substantial, but without a direct quickstart or installation instructions, the best way to get a feel for it is to explore the core directories and documentation:
tweetypie/: Handles tweet data services, likely responsible for fetching and aggregating tweet metadata.
user-signal-service/: Manages user interaction signals that feed into recommendation features.
product-mixer/: The orchestration layer that combines candidate sources and rankers to produce the final feed.
ml/: Contains machine learning models like SimClusters and TwHIN, along with the heavy-ranker neural network implementations.
navi/: Rust-based ML model serving framework for low-latency inference.
search-index/: Service powering search-based candidate sourcing.
The README and docs describe the high-level pipeline and the role of each key component. Given the Bazel build system usage, familiarity with Bazel and multi-language build orchestration is essential to build or extend the repo.
Overall, diving into the service interfaces and data flow in product-mixer is a good starting point to understand how candidates flow from sourcing to ranking and filtering.
Verdict
Twitter’s “the-algorithm” repo is a rare window into an advanced, production-scale recommendation system combining graph neural networks, multi-stage ranking, and modular service orchestration. For engineers interested in recommendation engine design, feed architecture, and ML system integration, it’s a valuable resource.
That said, the repo is complex, partially open-sourced (missing top-level Bazel files), polyglot (Scala + Rust), and lacks quickstart instructions. Building and running it locally requires significant effort and domain knowledge.
It’s best suited for engineers or teams with experience in large-scale service architectures, ML serving, and Scala/Rust ecosystems, looking to understand or build similar multi-stage recommendation pipelines. For casual exploration, the documentation and code layout provide plenty of insights into how a major social platform powers personalized feeds at scale.
Related Articles
- Pathway LLM App: unified pipelines for scalable retrieval-augmented generation and AI search — Pathway LLM App provides integrated pipelines for scalable RAG and AI search, combining vector and full-text indexing wi
- Microsoft’s ML-For-Beginners: A Project-Based Classic Machine Learning Curriculum — Microsoft’s ML-For-Beginners offers a 12-week, project-based classic machine learning course using Scikit-learn and Jupy
- leetcode-master: a structured roadmap for mastering data structures and algorithms with LeetCode — leetcode-master offers a curated, progressive path to mastering algorithms with LeetCode problems, detailed C++ explanat
→ GitHub Repo: twitter/the-algorithm ⭐ 73,106 · Scala