Tencent’s HY-World 2.0 takes a fresh approach to generative world modeling by producing persistent, editable 3D assets rather than just video clips. Unlike earlier video-only world models such as Genie 3 or Cosmos, this framework generates 3D meshes and Gaussian splatting representations that can be directly imported into popular engines like Blender, Unity, and Unreal Engine for real-time rendering on consumer GPUs.
multi-modal pipeline for 3d world generation
At its core, HY-World 2.0 decomposes the complex task of world synthesis into four sequential stages, each powered by large-scale models with tens of billions of parameters:
HY-Pano 2.0 (~80B parameters): Generates high-resolution panoramas as the foundational visual representation of the world.
WorldNav: Plans camera trajectories or viewpoints to navigate the generated panorama, enabling coherent exploration.
WorldStereo 2.0 (~17B parameters): Expands the world beyond the panorama by synthesizing stereo views, effectively building out 3D structure.
WorldMirror 2.0 (~1.2B parameters): Performs unified feed-forward reconstruction, predicting depth, surface normals, camera parameters, point clouds, and 3D Gaussian splatting attributes in a single forward pass.
This staged pipeline reflects a thoughtful architectural design to tackle the challenge incrementally: first establish a panoramic base, then plan viewpoints, extend the scene stereoscopically, and finally reconstruct a comprehensive 3D representation.
The framework supports flexible-resolution inference from 50K to 500K pixels, allowing users to balance detail and performance according to their needs. All model weights and inference code are provided in the repo, enabling reproducibility and experimentation.
unified feed-forward reconstruction and editability as technical strengths
What sets HY-World 2.0 apart is the emphasis on generating persistent, editable 3D assets rather than ephemeral video outputs. The WorldMirror 2.0 model consolidates multiple geometry and appearance predictions into a single forward pass, improving inference efficiency and consistency.
This unified approach contrasts with many existing methods that require separate models or optimization loops for depth, normals, and point clouds. By integrating these predictions, the system reduces runtime complexity and creates a 3D world representation compatible with standard graphics engines.
The use of 3D Gaussian Splatting (3DGS) as part of the output is particularly notable. 3DGS offers a compact and efficient way to represent complex scenes with volumetric splats, suitable for real-time rendering on consumer GPUs. This choice balances rendering quality and computational cost, making the generated worlds practical for interactive applications.
The tradeoff involves managing large models (up to 80B parameters for HY-Pano 2.0), which necessitates substantial GPU resources, particularly CUDA 12.8 and Python 3.11+. However, the repo’s support for flexible-resolution inference helps mitigate resource demands by allowing users to scale down pixel counts.
The repo also ships all model weights and inference code, which is a plus for transparency and experimentation but implies a significant download and storage footprint.
quick start with environment setup and dependencies
Installing and running HY-World 2.0 requires preparing a suitable environment. The recommendation is CUDA 12.8 with Python 3.11 or above. The installation is split into a shared environment setup followed by specific dependencies for the world reconstruction and generation components.
Here is the installation workflow captured verbatim from the repo’s README:
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
cd HY-World-2.0
conda create -n hyworld2 python=3.11.15
conda activate hyworld2
Next, install the dependencies for the World Reconstruction (WorldMirror 2.0) stage. This includes a custom variant of the gsplat package installed in editable mode:
# Recommended: install the custom gsplat variant once for both worldrecon and worldgen
cd hyworld2/worldgen/third_party/gsplat_maskgaussian
pip install -e . --no-build-isolation
cd ../../../../
If you only need the world reconstruction part and prefer a simpler fallback, the official gsplat package is supported:
pip install git+https://github.com/nerfstudio-project/gsplat.git
Finally, install one FlashAttention backend, with the prerequisite that torch and CUDA are installed:
pip install --no-build-isolation -r requirements_git.txt
This setup reflects a modular approach where users can opt to prepare just the reconstruction environment or the full generation pipeline.
verdict: suited for researchers and developers working on 3d world modeling
HY-World 2.0 is a significant technical effort offering a new paradigm in generative world models — one that outputs persistent, editable 3D representations instead of ephemeral videos. Its four-stage pipeline is a clear architectural decomposition addressing the complexity of multi-modal world generation.
The repo is best suited for researchers and developers with access to high-end GPUs and familiarity with Python and CUDA environments. The large model sizes and complex dependencies mean this is not a quick plug-and-play solution but rather a foundation for experimentation and further research.
The ability to export directly to Blender, Unity, and Unreal Engine makes it valuable for applications requiring real-time rendering and interactivity, such as game development, VR, or simulation.
On the downside, the resource requirements and installation complexity may be a barrier for hobbyists or those without appropriate hardware. Also, while the repo includes inference code and weights, detailed documentation on usage scenarios and customization might require exploration.
Overall, HY-World 2.0 offers an insightful reference implementation of persistent 3D world generation from multi-modal inputs, pushing the state of generative models toward more practical and editable outputs.
Related Articles
- WorldGrow: Hierarchical infinite 3D world synthesis with block-wise growth and coarse-to-fine refinement — WorldGrow generates infinite 3D worlds via hierarchical block-wise synthesis with coarse-to-fine refinement, ensuring se
- StereoWorld: stereo vision-based 3D-consistent video generation from binocular inputs — StereoWorld uses binocular stereo vision cues to guide 3D-consistent stereo video generation, offering a biologically in
- MultiWorld: a unified framework for multi-agent multi-view video world modeling — MultiWorld offers a unified framework for multi-agent multi-view video world modeling using a frozen VGGT backbone for i
- Matrix-3D: a practical pipeline for omnidirectional 3D world generation optimized for consumer GPUs — Matrix-3D generates explorable 360-degree 3D worlds from text or images using panoramic video and 3D Gaussian splatting,
- Tencent Hunyuan3D-Part: a two-stage pipeline for semantic 3D mesh part segmentation and generation — Tencent’s Hunyuan3D-Part offers a two-model pipeline for 3D mesh part segmentation with P3-SAM and high-fidelity part ge
→ GitHub Repo: Tencent-Hunyuan/HY-World-2.0 ⭐ 2,075 · Python