PEAR: real-time expressive 3D human mesh recovery at 100 FPS

PEAR achieves something most 3D human mesh recovery systems struggle with: it predicts expressive whole-body meshes — including body, hands, and face — simultaneously at 100 frames per second. This level of real-time performance, combined with the fidelity of expressive mesh modeling, makes it a noteworthy reference implementation in the space of 3D human reconstruction from monocular images or video.

What PEAR does: expressive real-time 3D human mesh recovery

PEAR is a research codebase implementing the SIGGRAPH 2026 paper that targets real-time expressive 3D human mesh recovery from images and videos. It outputs what the authors call EHM-s (Expressive Human Meshes), which extends standard body models by jointly modeling the full body, hands, and face. Under the hood, this involves predicting parameters for three popular parametric human models: SMPL-X, SMPL, and FLAME.

The architecture builds on prior works like HMR2 and Multi-HMR, which also predict human mesh parameters from monocular inputs, but PEAR distinguishes itself by achieving simultaneous whole-body estimation at a high frame rate. The stack is centered on PyTorch and PyTorch3D, leveraging GPU acceleration for real-time inference.

The pipeline supports two main inference modes: single-image input and video input, each with dedicated entry points to handle the temporal consistency and real-time constraints. The training code is also released, although the complete datasets used for training are not public; instead, a sample dataset archive is provided for demonstration.

The external dependencies include body model assets (SMPL, SMPLX, FLAME) which must be downloaded separately. These parametric models are industry standards for representing human shape and pose, providing a structured way to reconstruct 3D meshes from 2D images.

What sets PEAR apart: pixel-aligned prediction and real-time speed

The standout technical feature of PEAR is its ability to predict EHM-s parameters simultaneously for body, hands, and face at 100 FPS. Achieving this speed while handling expressive mesh recovery is non-trivial since it involves fine-grained parameter estimation across multiple body parts.

This performance is credited to a pixel-aligned architecture, which tightly couples image pixel features with mesh parameter prediction. The code uses PyTorch3D for 3D operations, which helps efficiently handle the geometric transformations and rendering under the hood.

From a code quality perspective, the repository is surprisingly clean for a research implementation. The modular structure separates data loading, model definition, training, and inference clearly. It follows PyTorch best practices with clear use of GPU acceleration, batch processing, and checkpointing.

The tradeoff here is the reliance on external body model assets, which are not bundled due to licensing, and the incomplete public release of training data. This means reproducing training results exactly will require additional effort or access to private datasets. However, the inference pipeline is complete and usable on standard hardware with a GPU.

Overall, PEAR balances real-time performance with expressive detail, a combination that most existing systems either compromise severely on speed or on the fidelity of the hand and face modeling.

Quick start

Preparation

Clone this repository and install the dependencies:

git clone --recursive https://github.com/Pixel-Talk/PEAR.git
cd PEAR

The README notes that the SMPL, SMPLX, and FLAME body model assets must be downloaded separately, so make sure to follow their instructions to place these assets correctly before running the code.

The repository provides inference scripts for both single image and video inputs, with instructions in the README on how to run them. The training code can be explored, but as noted, the full datasets are not publicly available.

verdict

PEAR is a solid reference for anyone interested in real-time 3D human mesh recovery that includes expressive hand and face modeling alongside the body. Its 100 FPS performance claim is impressive and useful for applications requiring live processing, such as AR/VR or interactive systems.

The main limitations are the dependency on external parametric model assets and the lack of full public training datasets, which restrict out-of-the-box training from scratch. However, inference and demo use cases are well supported.

If your project involves research or development in 3D human reconstruction from monocular inputs, especially where whole-body expressiveness and real-time speed matter, PEAR is worth a close look. The codebase is approachable for PyTorch practitioners, and the modular design makes it a good foundation for further experimentation or integration.

For production use, consider the licensing and data dependencies carefully. But as a research playground and real-time demo platform, PEAR delivers a clear combination of speed and expressive detail that is rare in current open implementations.

→ GitHub Repo: Pixel-Talk/PEAR ⭐ 256 · Python

Noureddine RAMDI / PEAR: real-time expressive 3D human mesh recovery at 100 FPS

What PEAR does: expressive real-time 3D human mesh recovery

What sets PEAR apart: pixel-aligned prediction and real-time speed

Quick start

Preparation

verdict