ComfyUI Trellis2: Extending ComfyUI with Dinov3 for 3D-Aware Diffusion Workflows

ComfyUI-Trellis2 brings an interesting twist to the modular ComfyUI ecosystem by integrating facebook’s Dinov3 model to enable 3D-aware diffusion workflows. While ComfyUI itself focuses on visual node-based workflows for diffusion models, Trellis2 extends this by embedding Dinov3, a vision transformer model, allowing for more sophisticated 3D feature extraction and manipulation within ComfyUI pipelines. This opens new possibilities for AI practitioners looking to experiment with 3D spatial understanding in their image synthesis workflows.

integration of facebook dinov3 within comfyui for 3d-aware diffusion

At its core, ComfyUI-Trellis2 is a Python-based extension for ComfyUI that leverages the Dinov3-vitl16-pretrain model from facebook, available on Hugging Face. This model is a vision transformer pre-trained to capture rich visual representations, which Trellis2 uses to enhance diffusion processes with 3D awareness.

The architecture is designed to slot into the existing ComfyUI custom nodes framework. Users must clone the Dinov3 model repository into a specific folder within ComfyUI’s models directory to meet the dependency requirements. This tight coupling ensures that the Trellis2 nodes can directly access the pretrained Dinov3 weights for inference.

Under the hood, Trellis2 relies heavily on PyTorch and specific GPU-accelerated libraries to handle the computational complexity of Dinov3. The project provides multiple precompiled Python wheels tailored for different PyTorch versions (2.7.0 and 2.8.0) on Windows 11, encapsulating dependencies like cumesh, nvdiffrast, and nvdiffrec_render, which are specialized libraries for mesh processing and differentiable rendering.

This setup essentially transforms ComfyUI workflows by adding nodes that can use Dinov3 features for spatial understanding, potentially improving output quality in tasks that benefit from 3D context.

technical strengths and tradeoffs of comfyui-trellis2

One of Trellis2’s main technical strengths is its direct integration of a state-of-the-art vision transformer model into a modular AI workflow environment. This integration enables advanced experimentation beyond standard 2D diffusion, giving users access to 3D-aware features without building complex pipelines from scratch.

The code quality aligns with typical Python AI projects: modular, with a focus on node-based extensions. The repository provides prebuilt wheels to simplify installation of otherwise challenging GPU-accelerated dependencies, which is a thoughtful move given the typical pain points around compiling such libraries.

However, this design comes with tradeoffs. The reliance on specific PyTorch versions and Windows 11 limits cross-platform compatibility. Users need hardware with CUDA 12.8 support and a compatible GPU, which can be a barrier for some. Installation isn’t trivial — the need to manually place the Dinov3 model in the exact directory and install multiple wheels requires careful attention.

Additionally, while the wheels ease setup, they are Windows-specific, which excludes Linux or Mac users unless they build from source or wait for other distributions.

In terms of developer experience, the repo expects users to have prior familiarity with ComfyUI and diffusion workflows. The documentation focuses on installation and dependencies but doesn’t extensively cover usage patterns or examples, which means that hands-on experimentation is necessary to unlock the full potential.

quick start with comfyui-trellis2

The installation instructions are provided clearly, tested on Windows 11 with Python 3.11 and Torch versions 2.7.0 or 2.8.0. Users must first acquire the Dinov3 model from Hugging Face and clone it into the ComfyUI models folder under facebook/dinov3-vitl16-pretrain-lvd1689m.

For Torch 2.7.0, the wheels installation commands are:

python -m pip install ComfyUI/custom_nodes/ComfyUI-Trellis2/wheels/Windows/Torch270/cumesh-1.0-cp311-cp311-win_amd64.whl
python -m pip install ComfyUI/custom_nodes/ComfyUI-Trellis2/wheels/Windows/Torch270/nvdiffrast-0.4.0-cp311-cp311-win_amd64.whl
python -m pip install ComfyUI/custom_nodes/ComfyUI-Trellis2/wheels/Windows/Torch270/nvdiffrec_render-0.0.0-cp311-cp311-win_amd64.whl
python -m pip install ComfyUI/custom_nodes/ComfyUI-Trellis2/wheels/Windows/Torch270/flex_gemm-0.0.1-cp311-cp311-win_amd64.whl
python -m pip install ComfyUI/custom_nodes/ComfyUI-Trellis2/wheels/Windows/Torch270/o_voxel-0.0.1-cp311-cp311-win_amd64.whl

For Torch 2.8.0, similar commands target the corresponding wheel folder.

This approach is hands-on and leaves little room for automation but ensures that the required custom dependencies are correctly installed.

verdict: who should use comfyui-trellis2

ComfyUI-Trellis2 is a specialized extension aimed at AI practitioners and researchers who want to push the boundaries of diffusion workflows by incorporating 3D spatial understanding through Dinov3. If you are already familiar with ComfyUI and comfortable managing Python environments, GPU dependencies, and PyTorch versions on Windows, this repo is worth exploring.

The tradeoff is clear: enhanced capability at the cost of installation complexity and platform limitations. If you need cross-platform support or a more plug-and-play solution, Trellis2 might not fit your workflow yet.

Overall, the repo represents a practical way to experiment with vision transformers in diffusion model pipelines, bridging the gap between 2D image generation and 3D-aware processing inside a modular node-based interface. Worth understanding even if you don’t adopt it immediately.

→ GitHub Repo: visualbruno/ComfyUI-Trellis2 ⭐ 534 · Python

Noureddine RAMDI / ComfyUI Trellis2: Extending ComfyUI with Dinov3 for 3D-Aware Diffusion Workflows

integration of facebook dinov3 within comfyui for 3d-aware diffusion

technical strengths and tradeoffs of comfyui-trellis2

quick start with comfyui-trellis2

verdict: who should use comfyui-trellis2