Lynx generates personalized videos from a single image using a frozen Diffusion Transformer with ID and Ref adapters. This modular design balances fidelity and efficiency.
PartCrafter generates multiple semantically distinct 3D mesh parts from a single RGB image using latent diffusion transformers, enabling structured 3D generation with pretrained models and VLM-based part suggestions.
SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.
ComfyUI-Trellis2 integrates facebook’s Dinov3 model into ComfyUI for advanced 3D-aware diffusion workflows. This article breaks down its architecture, strengths, and installation steps.
Matrix-3D generates explorable 360-degree 3D worlds from text or images using panoramic video and 3D Gaussian splatting, optimized to run on 12-19GB VRAM consumer GPUs.
MeanVC enables real-time zero-shot voice conversion using mean flows and diffusion transformers for single-step inference, addressing latency bottlenecks in diffusion models.
Avatar Forcing implements diffusion forcing for causal, real-time multimodal input processing enabling expressive head avatars with ~500ms latency and 6.8X speedup over baselines.
ComfyUI offers a graph/node interface for building complex diffusion model workflows offline, blending modularity with flexibility for AI practitioners.