Graph-R1: Reinforcement learning to train LLMs for reasoning over knowledge graphs

Graph-R1 addresses a challenging gap in applying large language models (LLMs) to structured knowledge: how to effectively reason over knowledge graphs in an end-to-end manner. Instead of treating the graph retrieval as a static backend lookup, this framework trains an LLM to iteratively “think → generate query → retrieve subgraph → rethink” using reinforcement learning (RL). This approach enables the model to learn graph traversal policies optimized for downstream knowledge-intensive tasks like question answering in healthcare, finance, and law.

What Graph-R1 does: RL-driven graph reasoning for LLMs

At its core, Graph-R1 is a Python-based research framework that integrates reinforcement learning into GraphRAG, a retrieval-augmented generation method for graphs. It extends existing approaches by introducing an explicit RL training loop where the LLM learns to perform reasoning steps over a knowledge hypergraph constructed from n-ary relations.

The knowledge base is not just a simple graph but a hypergraph capturing complex n-ary relations extracted via relation extraction techniques. This richer structure allows the model to reason beyond binary edges, representing more realistic knowledge structures.

The framework operates in an agentic cycle: the LLM first thinks about the query, then generates a graph query, retrieves a relevant subgraph from the knowledge hypergraph, and finally rethinks or updates its internal state based on the retrieved information. This cycle can repeat multiple times to refine the answer.

Training employs policy gradient methods including GRPO, REINFORCE++, and PPO to optimize the LLM’s query generation and reasoning steps with explicit reward signals. These rewards measure the quality of retrieved subgraphs and reasoning outcomes, guiding the model to better graph traversal policies.

The project builds on prior work such as Agent-R1, HyperGraphRAG, and LightRAG, advancing them by focusing on end-to-end RL training and n-ary relation hypergraphs. The target domains are knowledge-intensive fields where structured graph reasoning is valuable.

From a technical stack perspective, Graph-R1 is implemented in Python, relying heavily on PyTorch 2.4.0 for model training and FlashAttention for efficient transformer operations. It uses Conda for environment management and includes dataset preprocessing scripts for common question answering datasets like 2WikiMultiHopQA and HotpotQA.

The reinforcement learning cycle that sets Graph-R1 apart

What distinguishes Graph-R1 is its explicit end-to-end reinforcement learning loop that teaches the LLM to treat graph reasoning as a sequential decision-making problem. Instead of relying on static retrieval or heuristic subgraph selection, the model learns a policy that balances exploration and exploitation in the knowledge graph.

This cycle is implemented as:

Think: The LLM interprets the question and forms an internal reasoning state.
Generate query: Based on this state, it produces a structured query to retrieve a subgraph.
Retrieve subgraph: The framework fetches the subgraph matching the query from the hypergraph.
Rethink: The LLM incorporates the retrieved information, updating its reasoning.

This iterative loop can run multiple times, progressively refining the information context and leading to better final answers. The policy gradient training optimizes this behavior using reward signals that assess the relevance and correctness of retrieved information and answers.

The tradeoff here is complexity: training such a model end-to-end with reinforcement learning is resource-intensive and requires careful reward design. But the payoff is a more flexible and adaptive LLM that can navigate complex graph-structured knowledge effectively.

The codebase reflects this complexity with components for relation extraction, hypergraph construction, RL training algorithms, and multi-dataset support. The training scripts integrate GRPO, PPO, and REINFORCE++ implementations tailored to this problem.

Quick start

Install Environment

conda create -n graphr1 python==3.11.11
conda activate graphr1
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -e .
pip3 install -r requirements.txt

# pip install "ray[default]" debugpy

Dataset Preparation

The framework supports six datasets: 2WikiMultiHopQA, HotpotQA, Musique, NQ, PopQA, and TriviaQA. Download datasets from TeraBox and place them under the datasets/ directory.

Quick Start: Graph-R1 on 2WikiMultiHopQA

Preprocess the 2WikiMultiHopQA dataset to parquet format:

python script_process.py --data_source 2WikiMultiHopQA

From there, users can run training and evaluation scripts according to the provided documentation.

Verdict

Graph-R1 is a solid research framework for those interested in pushing the boundaries of LLM reasoning over structured knowledge graphs. Its main value lies in the end-to-end RL training loop that treats graph traversal and retrieval as a learned policy rather than a fixed heuristic.

The tradeoff is the complexity and computational cost of training such models, which makes it more suited for research labs or teams with access to significant compute resources. The codebase is specialized and assumes familiarity with reinforcement learning, graph-based knowledge representation, and transformer-based LLMs.

For engineers and researchers working on knowledge-intensive LLM applications, especially in domains like healthcare or law, Graph-R1 offers a practical and extensible starting point. However, for production use or simpler graph retrieval tasks, lighter-weight or static retrieval-augmented generation approaches may suffice.

The framework’s modular design and use of standard Python ML libraries make it adaptable, but expect a learning curve to master the RL training cycle and hypergraph construction.

In short, Graph-R1 is worth exploring if you want to experiment with RL-driven LLM reasoning over complex graph structures and can invest in the necessary infrastructure and expertise.

AI Knowledge Graph Generator: Building structured graphs from unstructured text with LLMs — A Python tool that converts unstructured text into interactive knowledge graphs using a three-phase LLM pipeline with SP
graph-memory: a knowledge graph context engine plugin for AI agents using dual-path recall — graph-memory compresses AI agent context by 75% using a knowledge graph and dual-path recall with Personalized PageRank.
Memary: Recursive Knowledge Graph Memory for Autonomous AI Agents — Memary is an open-source memory layer for AI agents using knowledge graphs and recursive retrieval to efficiently store
Automating knowledge graph extraction from text with LangChain and GPT-4o — This repo uses LangChain’s experimental graph transformers with GPT-4o to extract and visualize knowledge graphs from un
LycheeMemory: a lightweight semantic long-term memory framework for LLM agents — LycheeMemory offers a lightweight semantic memory system for LLM agents, cutting token use by 71% and costs by 55% compa

→ GitHub Repo: LHRLAB/Graph-R1 ⭐ 522 · Python

Noureddine RAMDI / Graph-R1: Reinforcement learning to train LLMs for reasoning over knowledge graphs