Noureddine RAMDI / PartCrafter: compositional 3D mesh generation with latent diffusion transformers

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

wgsxm/PartCrafter

PartCrafter approaches 3D mesh generation differently by producing multiple semantically meaningful parts simultaneously from a single RGB image, rather than generating a single monolithic mesh. This compositional paradigm opens up possibilities for downstream editing, manipulation, and scene composition that typical 3D generators don’t support.

compositional 3d mesh generation from a single rgb image

At its core, PartCrafter is a Python-based implementation of a NeurIPS 2025 paper that introduces compositional 3D mesh generation using Latent Diffusion Transformers. The system generates multiple part-level meshes (e.g., chair legs, seat, back) in one shot from a single RGB image, explicitly modeling the structure of the object.

The architecture builds on a pretrained DiT (Diffusion Transformer) backbone originally from TripoSG, fine-tuned with a fixed Variational Autoencoder (VAE) to support latent diffusion over 3D mesh parts. This latent diffusion transformer outputs multiple parts simultaneously, distinguishing it from monolithic 3D mesh generation approaches that produce one blob representing the entire object.

Additionally, the system supports both object-level and scene-level generation, enabling compositional scene synthesis from images.

To bridge the gap between real-world photos and the Objaverse domain, PartCrafter integrates style transfer techniques via Gemini, allowing the model to generalize better to real-world inputs. Another notable integration is a Vision-Language Model (VLM) based part count suggestion mechanism that automatically infers how many parts to generate from an input image, reducing manual parameter tuning.

The repo is built in Python and uses PyTorch 2.5.1 with CUDA 12.4 support, targeting NVIDIA H20 GPUs for training and inference. The codebase is fully open source, with pretrained model weights available on HuggingFace.

architectural strengths and design tradeoffs

What sets PartCrafter apart is its compositional approach to 3D mesh generation. Instead of producing a single mesh, it simultaneously generates multiple part-level meshes with semantic meaning. This explicit part-level structure enables downstream tasks such as part manipulation, editing, or recomposition, which are difficult to achieve with monolithic mesh outputs.

The use of a latent diffusion transformer fine-tuned on the DiT backbone is a key architectural choice. Diffusion models have shown strong generative capabilities, and combining them with transformers allows capturing long-range dependencies and complex structures in the latent space. The fixed VAE helps map high-dimensional mesh data into a latent space where diffusion operates efficiently.

The integration of VLM-based part count suggestion is a practical enhancement. Instead of requiring users to guess the number of parts, the system uses a vision-language model to propose a part count, improving usability for real-world inputs.

Style transfer via Gemini is another thoughtful addition, addressing the domain gap between synthetic Objaverse images used for training and real-world photos used for inference. This improves robustness but adds complexity and dependency on external models.

Tradeoffs include the heavy computational requirements, given the use of high-end GPUs (NVIDIA H20) and the PyTorch 2.5.1 CUDA 12.4 stack. The system’s complexity in handling multiple parts simultaneously also means longer training and inference times compared to simpler single-mesh generators.

The codebase itself appears well organized with modular scripts for inference and training, and explicit environment setup instructions enhancing developer experience. However, the model and tooling are research-focused; users should expect some setup overhead and experimental behavior.

quick start with partcrafter

The repo provides clear installation and quickstart instructions. Environment setup requires Python 3.11 and PyTorch 2.5.1 with CUDA 12.4. Here are the commands copied verbatim from the README:

conda create -n partcrafter python=3.11.13
conda activate partcrafter
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124

Then clone and install dependencies:

git clone https://github.com/wgsxm/PartCrafter.git
cd PartCrafter
bash settings/setup.sh

For users without root access, additional graphics libraries can be installed via conda:

conda install -c conda-forge libegl libglu pyopengl

To generate a 3D part-level mesh from an image with 3 parts and render it, use:

python scripts/inference_partcrafter.py \
  --image_path assets/images/np3_2f6ab901c5a84ed6bbdf85a67b22a2ee.png \
  --num_parts 3 --tag robot --render

The needed pretrained weights are fetched automatically:

  • PartCrafter model from wgsxm/PartCrafter → pretrained_weights/PartCrafter
  • RMBG model from briaai/RMBG-1.4 → pretrained_weights/RMBG-1.4

The generated results are saved under ./results/robot. Several example images are included in ./assets/images with filenames encoding recommended part counts.

The system also supports automatic part count suggestion using a Vision-Language Model (VLM) if the user prefers not to specify --num_parts manually.

verdict

PartCrafter offers a concrete implementation of a compositional 3D mesh generator using latent diffusion transformers, which stands out by producing multiple semantically distinct parts from single RGB inputs. This is a valuable step towards structured 3D generation enabling downstream editing and scene composition.

The repo’s code quality and documentation make it accessible for researchers and practitioners familiar with diffusion models and 3D generation pipelines. However, the resource requirements and complexity mean it’s best suited for experimental research setups or advanced prototyping rather than lightweight or production use.

If you’re working on 3D generative models, especially with a focus on compositionality or part-level manipulation, PartCrafter is worth exploring. The integration of VLM-based part count prediction and style transfer adds practical usability.

That said, expect a learning curve due to the model complexity and dependencies on high-end GPUs. The current pretrained weights and inference scripts provide a good starting point to experiment with this compositional approach.

Overall, PartCrafter is a solid research artifact that pushes the boundary of single-shot, part-level 3D mesh generation from images, balancing architectural innovation with practical tooling for exploration and development.


→ GitHub Repo: wgsxm/PartCrafter ⭐ 2,420 · Python