Computer-Vision on Noureddine RAMDI

Computer-Vision on Noureddine RAMDIhttps://ramdi.fr/tags/computer-vision/Recent content in Computer-Vision on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +00003D-RE-GEN: reconstructing editable 3D indoor scenes from a single photo with multi-model AI orchestrationhttps://ramdi.fr/github-stars/3d-re-gen-reconstructing-editable-3d-indoor-scenes-from-a-single-photo-with-multi-model-ai-orchestration/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/3d-re-gen-reconstructing-editable-3d-indoor-scenes-from-a-single-photo-with-multi-model-ai-orchestration/3D-RE-GEN reconstructs complete editable 3D indoor scenes from a single RGB photo. It integrates SAM, Hunyuan3D-2.0, and VGGT models in a modular Python pipeline.Autodistill: Automating vision model distillation from foundation models to edge deployableshttps://ramdi.fr/github-stars/autodistill-automating-vision-model-distillation-from-foundation-models-to-edge-deployables/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/autodistill-automating-vision-model-distillation-from-foundation-models-to-edge-deployables/Autodistill automates the pipeline from large foundation models to edge-ready vision models using pluggable plugins and a natural language ontology for zero-shot labeling.Comic Translate: AI-driven multi-language comic translation with full-page contexthttps://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.Fast3R: scalable multi-view 3D reconstruction with a single forward passhttps://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.MASt3R-SLAM: integrating foundation-model 3D priors into real-time dense SLAMhttps://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/MASt3R-SLAM integrates a pretrained 3D reconstruction model as a geometry prior in a dense SLAM pipeline, enabling real-time tracking and mapping without classical bundle adjustment or depth sensors.PartCrafter: compositional 3D mesh generation with latent diffusion transformershttps://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/PartCrafter generates multiple semantically distinct 3D mesh parts from a single RGB image using latent diffusion transformers, enabling structured 3D generation with pretrained models and VLM-based part suggestions.Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioninghttps://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/Pixal3D generates high-fidelity 3D assets with PBR textures from a single image using pixel-aligned projection conditioning. It offers a three-stage cascade and low-VRAM mode for consumer GPUs.SAM3-UNet: Adapting Meta's SAM3 for efficient dense prediction with a lightweight U-Net decoderhttps://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/SAM3-UNet adapts Meta’s SAM3 foundation model for dense prediction tasks using a parameter-efficient adapter and U-Net decoder, enabling training under 6 GB GPU memory.Tencent HY-World 2.0: multi-modal pipeline for persistent, editable 3D world generationhttps://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/Tencent’s HY-World 2.0 generates persistent 3D assets from text, images, or video using a four-stage pipeline. It outputs editable worlds compatible with Blender, Unity, and Unreal Engine.CodeFormer: Deep learning-based blind face restoration with fidelity controlhttps://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.OVIE: Monocular novel view synthesis without multi-view supervisionhttps://ramdi.fr/github-stars/ovie-monocular-novel-view-synthesis-without-multi-view-supervision/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/ovie-monocular-novel-view-synthesis-without-multi-view-supervision/OVIE trains novel view synthesis models using unpaired internet images, avoiding the need for calibrated multi-view datasets. It uses Vision Transformers and foundation models for pose and depth encoding.StereoWorld: stereo vision-based 3D-consistent video generation from binocular inputshttps://ramdi.fr/github-stars/stereoworld-stereo-vision-based-3d-consistent-video-generation-from-binocular-inputs/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/stereoworld-stereo-vision-based-3d-consistent-video-generation-from-binocular-inputs/StereoWorld uses binocular stereo vision cues to guide 3D-consistent stereo video generation, offering a biologically inspired approach to scene geometry understanding.Awesome-Deblurring: A comprehensive academic resource on image and video deblurring techniqueshttps://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/Awesome-Deblurring compiles 100+ key papers tracing image and video deblurring from classical optimization to modern deep learning, serving as a go-to bibliography for researchers and developers.MotionCrafter: unified 4D geometry and motion reconstruction from monocular videohttps://ramdi.fr/github-stars/motioncrafter-unified-4d-geometry-and-motion-reconstruction-from-monocular-video/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/motioncrafter-unified-4d-geometry-and-motion-reconstruction-from-monocular-video/MotionCrafter jointly reconstructs 4D geometry and dense motion from monocular video using a unified 4D VAE, eliminating post-optimization. This Python framework offers training and visualization tools.MultiWorld: a unified framework for multi-agent multi-view video world modelinghttps://ramdi.fr/github-stars/multiworld-a-unified-framework-for-multi-agent-multi-view-video-world-modeling/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/multiworld-a-unified-framework-for-multi-agent-multi-view-video-world-modeling/MultiWorld offers a unified framework for multi-agent multi-view video world modeling using a frozen VGGT backbone for implicit 3D understanding. It supports scalable multi-agent control and autoregressive inference.OpenPose: real-time multi-person 2D pose estimation with constant-time body detectionhttps://ramdi.fr/github-stars/openpose-real-time-multi-person-2d-pose-estimation-with-constant-time-body-detection/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/openpose-real-time-multi-person-2d-pose-estimation-with-constant-time-body-detection/OpenPose is a C++ library for real-time multi-person 2D pose estimation using Part Affinity Fields, enabling constant inference time for body detection regardless of person count.PEAR: real-time expressive 3D human mesh recovery at 100 FPShttps://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/PEAR predicts expressive 3D human mesh parameters for body, hands, and face simultaneously at 100 FPS using a pixel-aligned architecture based on PyTorch and SMPL-X models.Viseron: a modular, self-hosted AI video surveillance platformhttps://ramdi.fr/github-stars/viseron-a-modular-self-hosted-ai-video-surveillance-platform/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/viseron-a-modular-self-hosted-ai-video-surveillance-platform/Viseron is a self-hosted, local-only AI NVR platform in Python with modular AI features for privacy-focused video surveillance. Runs fully locally with Docker deployment.Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single imageshttps://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.NAS3R: Self-supervised 3D reconstruction and camera pose estimation with Gaussian splattinghttps://ramdi.fr/github-stars/nas3r-self-supervised-3d-reconstruction-and-camera-pose-estimation-with-gaussian-splatting/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/nas3r-self-supervised-3d-reconstruction-and-camera-pose-estimation-with-gaussian-splatting/NAS3R enables self-supervised 3D geometry and camera parameter estimation without ground-truth data, using Gaussian splatting and a VGGT backbone. It supports multi-view setups and optional pretrained initialization.NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view imageshttps://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.OmniStream: a multi-frame transformer for continuous video stream perceptionhttps://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.PromptHMR: integrating promptable architecture for 3D human mesh recovery from monocular inputshttps://ramdi.fr/github-stars/prompthmr-integrating-promptable-architecture-for-3d-human-mesh-recovery-from-monocular-inputs/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/prompthmr-integrating-promptable-architecture-for-3d-human-mesh-recovery-from-monocular-inputs/PromptHMR adapts SAM’s promptable design to 3D human mesh recovery, integrating SLAM, pose detection, and SMPL models into a unified pipeline for monocular images and videos.SceneMaker: a decoupled framework for 3D scene generation with de-occlusionhttps://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/SceneMaker separates de-occlusion from 3D object generation to handle occluded open-set scenes. It uses FLUX Kontext and Step1X-3D, with code and checkpoints available.SimRecon: compositional 3D scene reconstruction with viewpoint optimization and semantic graph synthesishttps://ramdi.fr/github-stars/simrecon-compositional-3d-scene-reconstruction-with-viewpoint-optimization-and-semantic-graph-synthesis/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/simrecon-compositional-3d-scene-reconstruction-with-viewpoint-optimization-and-semantic-graph-synthesis/SimRecon converts real-world videos into simulation-ready 3D scenes by combining geometry reconstruction, instance segmentation, viewpoint optimization, and semantic scene graph synthesis.face_recognition: easy deep learning face recognition in Python with dlibhttps://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/face_recognition provides a simple Python API and CLI for highly accurate face detection and recognition using dlib’s deep learning model. It supports facial landmarks and multi-core processing.Deep-Live-Cam: Real-time face swapping optimized across diverse hardware with ONNX Runtimehttps://ramdi.fr/github-stars/deep-live-cam-real-time-face-swapping-optimized-across-diverse-hardware-with-onnx-runtime/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/deep-live-cam-real-time-face-swapping-optimized-across-diverse-hardware-with-onnx-runtime/Deep-Live-Cam offers real-time face swapping and deepfake video generation using ONNX Runtime with multiple execution providers for optimized performance on GPUs, CPUs, and Apple Silicon.Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision modelhttps://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.