NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.
OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.
PromptHMR adapts SAM’s promptable design to 3D human mesh recovery, integrating SLAM, pose detection, and SMPL models into a unified pipeline for monocular images and videos.
SceneMaker separates de-occlusion from 3D object generation to handle occluded open-set scenes. It uses FLUX Kontext and Step1X-3D, with code and checkpoints available.
SimRecon converts real-world videos into simulation-ready 3D scenes by combining geometry reconstruction, instance segmentation, viewpoint optimization, and semantic scene graph synthesis.
face_recognition provides a simple Python API and CLI for highly accurate face detection and recognition using dlib’s deep learning model. It supports facial landmarks and multi-core processing.
Deep-Live-Cam offers real-time face swapping and deepfake video generation using ONNX Runtime with multiple execution providers for optimized performance on GPUs, CPUs, and Apple Silicon.
YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.