DROID-W tackles a practical challenge in visual SLAM: how to handle casually captured video sequences that include dynamic content and real-world uncertainties. While many SLAM systems assume static scenes, DROID-W extends the DROID-SLAM pipeline to jointly estimate camera trajectory, scene structure, and a notion of dynamic uncertainty, making it more robust to in-the-wild footage.
what droid-w does: dynamic slam with uncertainty and metric depth
At its core, DROID-W is an extension of DROID-SLAM, a dense visual SLAM system originally designed for static environments. This repo adapts the pipeline to handle dynamic scenes by incorporating dynamic uncertainty estimation alongside camera pose and scene reconstruction.
The architecture relies heavily on Lie group optimization implemented in the lietorch library for pose graph optimization over SE(3) transformations. This mathematical foundation allows smooth and consistent updates of camera poses alongside scene structure.
On top of this, DROID-W integrates differentiable Gaussian rasterization techniques to represent scene geometry probabilistically, enabling a soft, differentiable mapping process that can accommodate dynamic elements and uncertainty.
A key addition is the use of metric depth estimation powered by MMCV, which provides scale-aware depth predictions rather than relative depth. This is crucial for producing metrically consistent reconstructions from monocular video.
The system supports multiple benchmarks focused on dynamic scenes, including Bonn, TUM RGB-D, DyCheck, as well as a custom dataset and real YouTube sequences for testing under varied real-world conditions.
Under the hood, the repo is primarily Python-based with CUDA extensions for performance-critical parts, requiring CUDA 11.8 and PyTorch 2.1.0. Several custom CUDA extensions are built from source to support the core optimization and rendering pipelines.
technical strengths: handling dynamic uncertainty with lie group optimization and gaussian splatting
What sets DROID-W apart is its joint estimation of camera trajectory, scene structure, and dynamic uncertainty. Rather than treating dynamic objects as outliers or ignoring them, it explicitly models uncertainty in the dynamic parts of the scene.
The use of Lie group optimization (lietorch) is a solid choice for pose and transformation updates, providing a mathematically sound framework for consistent optimization over SE(3). This approach is more robust than naive parameter optimization and fits well with SLAM’s geometric nature.
Differentiable Gaussian rasterization adds a probabilistic mapping layer that can capture the fuzziness inherent in dynamic scenes, where object boundaries and movements introduce ambiguity. This allows gradient-based refinement of the map and poses in a way that gracefully handles uncertainty.
Metric depth estimation via MMCV integration is another strength. Many monocular SLAM systems only recover depth up to scale, but DROID-W aims for metric-scale depth, which is more useful for applications like robotics or AR.
The codebase is surprisingly clean for such a complex system, with clear separation of concerns between optimization, mapping, and depth estimation modules. However, the tradeoff is complexity in setup and dependency management, given the need to build several custom CUDA extensions and ensure compatibility with specific CUDA and PyTorch versions.
The optional Gaussian Splatting-based mapping is an interesting feature, although it is disabled by default. This indicates the system is designed for extensibility and experimentation, but users should be aware that the default pipeline is the main supported path.
quick start
- First you have to make sure that you clone the repo with the
--recursiveflag. The simplest way to do so, is to use anaconda.
git clone --recursive https://github.com/MoyangLi00/DROID-W.git
cd DROID-W
- Creating a new conda environment.
conda create --name droid-w python=3.10
conda activate droid-w
- Install CUDA 11.8 and torch-related pacakges
pip install numpy==1.26.3
conda install --channel "nvidia/label/cuda-11.8.0" cuda-toolkit
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-2.1.0+cu118.html
pip3 install -U xformers==0.0.22.post7+cu118 --index-url https://download.pytorch.org/whl/cu118
- Install the remaining dependencies.
python -m pip install -e thirdparty/lietorch --no-build-isolation
python -m pip install -e thirdparty/diff-gaussian-rasterization-w-pose --no-build-isolation
python -m pip install -e thirdparty/simple-knn --no-build-isolation
- Check installation.
python -c "import torch; import lietorch; import simple_knn; import diff_gaussian_rasterization; print(torch.cuda.is_available())"
- Now install the droid backends and the other requirements
python -m pip install -e . --no-build-isolation
python -m pip install -r requirements.txt
- Install MMCV (used by metric depth estimator)
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1.0/index.html
- Download the pretained models droid.pth, put it inside the
pretrainedfolder.
verdict
DROID-W is relevant for researchers and practitioners working on visual SLAM in dynamic environments, especially where real-world, casually captured video sequences introduce motion and uncertainty that static SLAM systems struggle with.
The repo’s strengths lie in its principled approach to joint pose, structure, and uncertainty estimation using Lie group optimization and differentiable Gaussian mapping. The integration of metric depth estimation is a practical plus for applications needing scale-aware reconstruction.
The tradeoff is a complex setup process and dependency on specific CUDA and PyTorch versions, plus a need for an NVIDIA GPU with CUDA 11.8 support. The optional Gaussian Splatting mapping is an interesting direction but currently not enabled by default.
If you need a robust baseline for dynamic SLAM that goes beyond treating moving objects as noise, DROID-W offers a solid foundation. However, expect to invest time in environment setup and understanding the optimization backend to make the most of it.
→ GitHub Repo: MoyangLi00/DROID-W ⭐ 350 · Python