NAS3R: Self-supervised 3D reconstruction and camera pose estimation with Gaussian splatting

NAS3R tackles a challenging problem: estimating 3D geometry and camera poses without any ground-truth annotations or pretrained priors. This is notable because most 3D reconstruction frameworks rely on supervised data or pretrained models to achieve stable and accurate results. Instead, NAS3R uses a fully self-supervised feed-forward pipeline built around Gaussian splatting and a VGGT-based architecture. The approach supports 2-view to multi-view configurations, making it flexible for different input setups.

what NAS3R does: self-supervised 3d reconstruction with gaussian splatting

At its core, NAS3R is a framework designed for joint estimation of 3D geometry and camera parameters from images, without requiring any ground-truth pose or depth data. It builds on Gaussian splatting techniques — a method where scene geometry is represented by a set of Gaussian blobs rendered in a differentiable manner to synthesize novel views.

The repo integrates the diff-gaussian-rasterization submodule for efficient Gaussian splatting rasterization and uses the pixelSplat camera system to handle camera parameterization. The backbone network is based on VGGT (a variant of VGG adapted for this task) trained on the RealEstate10K dataset at a resolution of 256x256 pixels.

NAS3R supports configurations ranging from just 2 views up to 10 views, allowing it to fuse information from multiple cameras. It provides an option to initialize the VGGT backbone from pretrained weights, which can stabilize training but is not mandatory — highlighting the self-supervised ambition.

The project also offers evaluation scripts that assess both novel view synthesis quality and camera pose estimation accuracy. Under the hood, the codebase builds on several related projects such as SPFSplat, NoPoSplat, pixelSplat, DUSt3R, and CroCo, showing a layered and modular architecture.

technical strengths and tradeoffs: feed-forward splatting meets self-supervision

The standout feature of NAS3R is its ability to work in a fully self-supervised manner without pretrained priors, which is rare in the Gaussian splatting domain. Typically, such methods rely on supervised pose data or pretrained backbones to ensure convergence and stability.

Using a feed-forward VGGT backbone trained on RealEstate10K provides a strong but flexible feature extractor. The optional pretrained initialization path is a practical concession to improve stability, reflecting a tradeoff between pure self-supervision and practical training dynamics.

The use of Hydra for configuration management is a good choice for complex experiments, as it cleanly separates config from code, making it easier to replicate and extend.

From a codebase perspective, the repo is well-structured with clear submodules dedicated to Gaussian rasterization and camera modeling. This modular approach aids in clarity and maintainability but also means users need to understand several moving parts.

One limitation is the reliance on a fixed input resolution (256x256), which might limit applications requiring higher fidelity. Also, while the multi-view support extends up to 10 views, real-world scenarios with more views might need further adaptation.

The evaluation covers both novel view synthesis and pose estimation, which is crucial for understanding the quality of the reconstruction pipeline from both geometry and camera perspective.

quick start: setting up nas3r environment and dependencies

The project provides clear installation instructions leveraging conda and pip to set up the environment. It requires cloning the repository with submodules and setting up a Python 3.11 environment with specific PyTorch versions and dependencies.

# Clone with submodules
 git clone --recurse-submodules git@github.com:ranrhuang/NAS3R.git
 cd NAS3R

# Create and activate conda environment
 conda create -n nas3r python=3.11 -y
 conda activate nas3r

# Install PyTorch and torchvision with CUDA 12.1 support
 pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121

# Install requirements and the diff-gaussian-rasterization submodule
 pip install -r requirements.txt --no-build-isolation
 pip install -e submodules/diff-gaussian-rasterization --no-build-isolation

This setup ensures you have the correct dependencies to run the training and evaluation scripts. The use of --no-build-isolation hints that the submodule might require direct access to build dependencies, so the environment consistency is important.

verdict: who nas3r is for and what to expect

NAS3R is a solid codebase for researchers and practitioners interested in self-supervised 3D reconstruction and camera pose estimation. Its main appeal is achieving these tasks without ground-truth poses or pretrained models, which is uncommon and worth exploring if you are working on similar problems.

The repo is best suited for those comfortable navigating machine learning codebases with multiple dependencies and submodules. It’s not a plug-and-play solution for out-of-the-box 3D reconstruction but rather a research-grade framework that offers flexibility and insight into self-supervised Gaussian splatting.

Limitations include the fixed resolution input, the reliance on a specific dataset (RealEstate10K) for training, and the complex dependency management. Hardware requirements for training and rendering are non-trivial.

Overall, NAS3R is worth understanding if you want to experiment with self-supervised approaches to 3D geometry and camera estimation using Gaussian splatting, especially if you want to see how pretrained vs. non-pretrained paths affect stability and results.

Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorch — Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, ac
PyTorch’s dynamic neural networks and tape-based autograd: a deep dive into flexible deep learning — Explore PyTorch’s unique tape-based autograd and dynamic neural networks architecture that enables flexible model develo
Hands-on with YOLOv5: A practical deep dive into Ultralytics’ PyTorch vision model — YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detectio
Hugging Face Transformers: a unified API for state-of-the-art AI models across modalities — Hugging Face Transformers offers a unified Python API to access over 1 million pretrained AI models for text, vision, an
Deep-Live-Cam: Real-time face swapping optimized across diverse hardware with ONNX Runtime — Deep-Live-Cam offers real-time face swapping and deepfake video generation using ONNX Runtime with multiple execution pr

→ GitHub Repo: ranrhuang/NAS3R ⭐ 73 · Python

Noureddine RAMDI / NAS3R: Self-supervised 3D reconstruction and camera pose estimation with Gaussian splatting

what NAS3R does: self-supervised 3d reconstruction with gaussian splatting

technical strengths and tradeoffs: feed-forward splatting meets self-supervision

quick start: setting up nas3r environment and dependencies

verdict: who nas3r is for and what to expect

Related Articles