4DGen extends Stable Video Diffusion to generate geometry-consistent multi-view RGB-D videos from single RGB-D inputs using pointmap latents. Trained on multi-view robotic datasets, it enables robot pose extraction from generated videos.
Genie Envisioner offers a two-stage training pipeline using video diffusion for robotic manipulation, separating world model adaptation from action policy learning. Here’s how it works and how to get started.
MotionCrafter jointly reconstructs 4D geometry and dense motion from monocular video using a unified 4D VAE, eliminating post-optimization. This Python framework offers training and visualization tools.