SimScale: a scalable sim-real co-training pipeline for autonomous driving planners

Simulating realistic driving scenarios that help autonomous vehicle planners generalize well to the real world remains a tough problem. SimScale approaches this by combining synthetic simulation data with real-world driving data in a co-training setup. This mix improves planner robustness and performance across different architectures, using a carefully designed pipeline that synthesizes high-fidelity driving clips with pseudo-expert demonstrations.

what SimScale does and how it is architected

SimScale is a Python-based framework supporting end-to-end autonomous driving planner training via a scalable simulation pipeline. It was introduced in a CVPR 2026 oral paper and focuses on synthesizing diverse, reactive driving scenarios for training planners. The system is built to generate synthetic driving data clips at 2Hz frequency, each about six seconds long, using nuPlan and NAVSIM datasets as a foundation.

At its core, SimScale organizes simulation into multiple rounds, starting with large-scale synthetic datasets (65K tokens) and gradually narrowing down to smaller ones (down to 17K tokens), which improves sample efficiency while maintaining diversity. These rounds also vary the pseudo-expert supervision type, including planner-based or recovery-based experts.

The framework supports training three distinct planner architectures:

Regression-based LTF (Learn to Fuse)
Diffusion-based DiffusionDrive
Scoring-based GTRS-Dense, which can be trained either with pseudo-expert imitation or reward-only supervision

This variety shows SimScale’s flexibility to accommodate different modeling paradigms. The training scripts and pretrained checkpoints are included, making it straightforward to reproduce or extend the experiments.

what makes SimScale technically interesting

The main technical highlight of SimScale is its sim-real co-training strategy. Instead of training only on real driving logs or purely synthetic data, it blends both by iteratively refining planner policies through simulation rounds. This iterative curriculum lets planners learn from pseudo-expert demonstrations generated within simulated environments that mimic challenging real-world conditions.

A key part is how SimScale manages the simulation rounds and pseudo-expert supervision. The configuration system uses parameters like SYN_IDX to select simulation rounds and SYN_GT to toggle between pseudo-expert types. This gives fine-grained control over the data distribution and training signals.

The scoring-based GTRS-Dense planner is particularly notable. It can be trained in two modes: one using imitation of pseudo-expert actions and another relying solely on reward signals, which is less common but potentially more robust. The repo provides model checkpoints for both, allowing direct comparison.

Code quality is pragmatic with a clear modular structure separating data preparation, model training, and evaluation. The pipeline supports large-scale synthetic data tokenization and batching, optimized for efficiency given the volume of simulation data.

Tradeoffs are evident: while synthetic data boosts generalization, it requires careful simulation design and pseudo-expert modeling to avoid overfitting to simulated scenarios. The reliance on nuPlan/NAVSIM datasets for base data means real-world coverage depends on those datasets’ scope.

Performance results on NAVSIM v2 benchmarks quantify the gains. For example, the GTRS-Dense planner achieves navhard scores of 46.1 (+7.8) with pseudo-expert training and 46.9 (+8.6) with rewards-only training. Regression-based LTF and DiffusionDrive planners also show consistent improvements, supporting the approach’s generality.

quick start with SimScale

The repo provides a minimal setup to get started:

# Clone SimScale Repo
git clone https://github.com/OpenDriveLab/SimScale.git
cd SimScale

# Create environment
conda env create --name simscale -f environment.yml
conda activate simscale
pip install -e .

This sets up the conda environment with all necessary dependencies and installs the package in editable mode. From there, users can run training scripts and experiment with different planner configurations and simulation rounds.

verdict

SimScale is a solid toolkit for researchers and engineers working on autonomous driving planner training who want to integrate synthetic simulation data effectively with real-world driving logs. The sim-real co-training strategy and multi-round simulation curriculum are well thought out and supported by comprehensive scripts and pretrained models.

Its main limitation is the dependency on existing datasets like nuPlan/NAVSIM and the complexity of managing simulation rounds and pseudo-expert types, which may require some ramp-up time to understand. The codebase is practical, not overly polished, but well-structured enough to extend.

If you’re developing or benchmarking planners that need better generalization to diverse driving scenarios, especially with scoring-based policies, SimScale offers a valuable reference and starting point. It’s less suited for casual users looking for plug-and-play solutions but worthwhile for anyone serious about simulation-augmented autonomous driving research.

ComfyUI: modular visual workflows for diffusion model experimentation — ComfyUI offers a graph/node interface for building complex diffusion model workflows offline, blending modularity with f
CrewAI: A lean Python framework for orchestrating autonomous AI agents with precise control — CrewAI is a Python framework for autonomous AI agents emphasizing speed, flexibility, and precise control through ‘Crews

→ GitHub Repo: OpenDriveLab/SimScale ⭐ 245 · Python

Noureddine RAMDI / SimScale: a scalable sim-real co-training pipeline for autonomous driving planners

what SimScale does and how it is architected

what makes SimScale technically interesting

quick start with SimScale

verdict

Related Articles