Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioning

Pixal3D takes a distinct approach to single-image 3D asset generation by explicitly back-projecting pixel features into 3D space, establishing direct pixel-to-3D correspondences. This method departs from the more common cross-attention injections and pushes the fidelity of reconstructed assets closer to near-exact reconstruction.

pixel-aligned 3d asset generation from a single image

Pixal3D is a Python-based open-source project developed collaboratively by Tsinghua University and Tencent ARC Lab, presented as a SIGGRAPH 2026 paper. The repo focuses on generating high-fidelity 3D assets complete with physically based rendering (PBR) textures from just one input image.

At its core, Pixal3D introduces pixel-aligned projection conditioning. Instead of loosely injecting image features through cross-attention mechanisms, it back-projects pixel features into 3D space by leveraging camera geometry. This step firmly grounds the 3D reconstruction in the pixel data, resulting in a direct pixel-to-3D feature correspondence that lifts the quality ceiling beyond previous methods.

The architecture implements a three-stage cascade: starting from a sparse structure, then refining the shape, and finally generating the texture. Each stage progressively increases the resolution from 32 up to 1024, facilitating finer detail and texture accuracy.

Technically, the backbone of the main branch uses an improved Trellis.2 model, a design choice that likely balances performance and accuracy. For reproducibility, the repository also maintains the original Direct3D-S2 implementation as referenced in the paper.

The stack centers around Python with PyTorch for deep learning, CUDA for GPU acceleration, and includes a suite of training scripts, inference utilities, and a Gradio-based web demo for interactive exploration. Notably, the repo provides a low-VRAM mode to accommodate consumer GPUs by loading models on-demand, trading off some speed for reduced memory footprint.

pixel-aligned projection conditioning: the key technical strength

What sets Pixal3D apart is its pixel-aligned projection conditioning mechanism. Previous single-image 3D generation models typically rely on cross-attention to inject image features into the 3D generation pipeline. This approach, while effective, is indirect and can result in a loss of spatial correspondence fidelity.

Pixal3D flips this by back-projecting the pixel features from the 2D image into the 3D space using camera geometry. This explicit geometric conditioning ensures that each 3D point corresponds directly to a pixel feature in the source image, minimizing ambiguity and information loss.

This architectural decision is not trivial. It demands precise calibration of camera parameters and a robust projection pipeline within the model’s architecture. The repo’s implementation of a three-stage cascade also means complexity in training and inference, as each stage (sparse structure, shape, texture) builds on the previous one with increasing resolution.

The tradeoff here is clear: the model achieves near-reconstruction-level fidelity on single-image 3D generation tasks but at the cost of architectural complexity and computational demand. The inclusion of a low-VRAM mode in the repo acknowledges this and attempts to make the technology accessible on consumer hardware, though performance may vary.

From a code quality perspective, the repo is surprisingly clean given the complexity. It includes modular code for data preparation, training, inference, and demo, with clear separation of concerns. The preservation of the original Direct3D-S2 implementation branch alongside the improved Trellis.2-based main branch demonstrates a commitment to reproducibility and community validation.

quick start with pixal3d

The repo provides clear installation and usage instructions designed to get you up and running efficiently.

First, you need to follow the TRELLIS.2 installation guide to set up the base environment. This is a prerequisite since Pixal3D builds on top of that framework.

Next, install the additional dependencies:

pip install -r requirements.txt

Then install natten, a specialized CUDA-accelerated library, with the appropriate CUDA architecture and worker count for your machine:

NATTEN_CUDA_ARCH="xx" NATTEN_N_WORKERS=xx pip install natten==0.21.0 --no-build-isolation

Replace xx accordingly.

Finally, install utils3d:

pip install https://github.com/LDYang694/Storages/releases/download/20260430/utils3d-0.0.2-py3-none-any.whl

To generate a 3D mesh from a single image, use the inference script:

python inference.py --image assets/images/0_img.png --output ./output.glb

If you are constrained on GPU memory, the low-VRAM mode reduces peak usage by loading models on demand:

python inference.py --image assets/images/0_img.png --output ./output.glb --low_vram

By default, the resolution is 1536 in standard mode and 1024 in low-VRAM mode, but you can override it with the --resolution flag.

This workflow makes it practical to experiment with Pixal3D on accessible hardware without deep expertise in the underlying architecture.

verdict

Pixal3D offers a technically interesting approach to single-image 3D asset creation by explicitly establishing pixel-to-3D correspondences through projection conditioning. This design pushes fidelity closer to near-reconstruction levels, a significant step beyond more indirect feature injection methods.

Its three-stage cascade and reliance on camera geometry introduce a fair amount of complexity, both conceptually and computationally, but the repo mitigates this with a low-VRAM mode and comprehensive tooling.

This repo is particularly relevant for researchers and developers working on 3D reconstruction, graphics, or AI-driven asset generation who want to explore the boundaries of single-image 3D synthesis. It’s less suited for quick prototyping or projects without access to GPU resources.

Overall, the codebase strikes a good balance between cutting-edge research and practical usability, with a clear focus on reproducibility and accessibility for practitioners willing to engage with the architectural depth behind pixel-aligned 3D generation.

PAT3D: orchestrating text-to-3D simulation-ready scenes through a multi-stage AI and physics pipeline — PAT3D composes a 9-stage pipeline combining LLMs, vision models, 3D asset generators, and physics simulation to produce
Matrix-3D: a practical pipeline for omnidirectional 3D world generation optimized for consumer GPUs — Matrix-3D generates explorable 360-degree 3D worlds from text or images using panoramic video and 3D Gaussian splatting,
Recreating the 3dfx Voodoo GPU in SpinalHDL for FPGA and cycle-accurate simulation — SpinalVoodoo rebuilds the classic 3dfx Voodoo Graphics GPU in SpinalHDL, targeting FPGA synthesis and cycle-accurate sim
AniGen: GPU-accelerated 3D animation generation with Python and CUDA — AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spc
Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single images — Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from sing

→ GitHub Repo: TencentARC/Pixal3D ⭐ 1,315 · Python

Noureddine RAMDI / Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioning

pixel-aligned 3d asset generation from a single image

pixel-aligned projection conditioning: the key technical strength

quick start with pixal3d

verdict

Related Articles