<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Computer-Vision on Noureddine RAMDI</title><link>https://ramdi.fr/tags/computer-vision/</link><description>Recent content in Computer-Vision on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/computer-vision/index.xml" rel="self" type="application/rss+xml"/><item><title>3D-RE-GEN: reconstructing editable 3D indoor scenes from a single photo with multi-model AI orchestration</title><link>https://ramdi.fr/github-stars/3d-re-gen-reconstructing-editable-3d-indoor-scenes-from-a-single-photo-with-multi-model-ai-orchestration/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/3d-re-gen-reconstructing-editable-3d-indoor-scenes-from-a-single-photo-with-multi-model-ai-orchestration/</guid><description>3D-RE-GEN reconstructs complete editable 3D indoor scenes from a single RGB photo. It integrates SAM, Hunyuan3D-2.0, and VGGT models in a modular Python pipeline.</description></item><item><title>Autodistill: Automating vision model distillation from foundation models to edge deployables</title><link>https://ramdi.fr/github-stars/autodistill-automating-vision-model-distillation-from-foundation-models-to-edge-deployables/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/autodistill-automating-vision-model-distillation-from-foundation-models-to-edge-deployables/</guid><description>Autodistill automates the pipeline from large foundation models to edge-ready vision models using pluggable plugins and a natural language ontology for zero-shot labeling.</description></item><item><title>Comic Translate: AI-driven multi-language comic translation with full-page context</title><link>https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/</guid><description>Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.</description></item><item><title>Fast3R: scalable multi-view 3D reconstruction with a single forward pass</title><link>https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/</guid><description>Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.</description></item><item><title>MASt3R-SLAM: integrating foundation-model 3D priors into real-time dense SLAM</title><link>https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/</guid><description>MASt3R-SLAM integrates a pretrained 3D reconstruction model as a geometry prior in a dense SLAM pipeline, enabling real-time tracking and mapping without classical bundle adjustment or depth sensors.</description></item><item><title>PartCrafter: compositional 3D mesh generation with latent diffusion transformers</title><link>https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/</guid><description>PartCrafter generates multiple semantically distinct 3D mesh parts from a single RGB image using latent diffusion transformers, enabling structured 3D generation with pretrained models and VLM-based part suggestions.</description></item><item><title>Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioning</title><link>https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/</guid><description>Pixal3D generates high-fidelity 3D assets with PBR textures from a single image using pixel-aligned projection conditioning. It offers a three-stage cascade and low-VRAM mode for consumer GPUs.</description></item><item><title>SAM3-UNet: Adapting Meta's SAM3 for efficient dense prediction with a lightweight U-Net decoder</title><link>https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/</guid><description>SAM3-UNet adapts Meta&amp;rsquo;s SAM3 foundation model for dense prediction tasks using a parameter-efficient adapter and U-Net decoder, enabling training under 6 GB GPU memory.</description></item><item><title>Tencent HY-World 2.0: multi-modal pipeline for persistent, editable 3D world generation</title><link>https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/</guid><description>Tencent&amp;rsquo;s HY-World 2.0 generates persistent 3D assets from text, images, or video using a four-stage pipeline. It outputs editable worlds compatible with Blender, Unity, and Unreal Engine.</description></item><item><title>CodeFormer: Deep learning-based blind face restoration with fidelity control</title><link>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</guid><description>CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.</description></item><item><title>OVIE: Monocular novel view synthesis without multi-view supervision</title><link>https://ramdi.fr/github-stars/ovie-monocular-novel-view-synthesis-without-multi-view-supervision/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/ovie-monocular-novel-view-synthesis-without-multi-view-supervision/</guid><description>OVIE trains novel view synthesis models using unpaired internet images, avoiding the need for calibrated multi-view datasets. It uses Vision Transformers and foundation models for pose and depth encoding.</description></item><item><title>StereoWorld: stereo vision-based 3D-consistent video generation from binocular inputs</title><link>https://ramdi.fr/github-stars/stereoworld-stereo-vision-based-3d-consistent-video-generation-from-binocular-inputs/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/stereoworld-stereo-vision-based-3d-consistent-video-generation-from-binocular-inputs/</guid><description>StereoWorld uses binocular stereo vision cues to guide 3D-consistent stereo video generation, offering a biologically inspired approach to scene geometry understanding.</description></item><item><title>Awesome-Deblurring: A comprehensive academic resource on image and video deblurring techniques</title><link>https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/</guid><description>Awesome-Deblurring compiles 100+ key papers tracing image and video deblurring from classical optimization to modern deep learning, serving as a go-to bibliography for researchers and developers.</description></item><item><title>MotionCrafter: unified 4D geometry and motion reconstruction from monocular video</title><link>https://ramdi.fr/github-stars/motioncrafter-unified-4d-geometry-and-motion-reconstruction-from-monocular-video/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/motioncrafter-unified-4d-geometry-and-motion-reconstruction-from-monocular-video/</guid><description>MotionCrafter jointly reconstructs 4D geometry and dense motion from monocular video using a unified 4D VAE, eliminating post-optimization. This Python framework offers training and visualization tools.</description></item><item><title>MultiWorld: a unified framework for multi-agent multi-view video world modeling</title><link>https://ramdi.fr/github-stars/multiworld-a-unified-framework-for-multi-agent-multi-view-video-world-modeling/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/multiworld-a-unified-framework-for-multi-agent-multi-view-video-world-modeling/</guid><description>MultiWorld offers a unified framework for multi-agent multi-view video world modeling using a frozen VGGT backbone for implicit 3D understanding. It supports scalable multi-agent control and autoregressive inference.</description></item><item><title>OpenPose: real-time multi-person 2D pose estimation with constant-time body detection</title><link>https://ramdi.fr/github-stars/openpose-real-time-multi-person-2d-pose-estimation-with-constant-time-body-detection/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/openpose-real-time-multi-person-2d-pose-estimation-with-constant-time-body-detection/</guid><description>OpenPose is a C++ library for real-time multi-person 2D pose estimation using Part Affinity Fields, enabling constant inference time for body detection regardless of person count.</description></item><item><title>PEAR: real-time expressive 3D human mesh recovery at 100 FPS</title><link>https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/</guid><description>PEAR predicts expressive 3D human mesh parameters for body, hands, and face simultaneously at 100 FPS using a pixel-aligned architecture based on PyTorch and SMPL-X models.</description></item><item><title>Viseron: a modular, self-hosted AI video surveillance platform</title><link>https://ramdi.fr/github-stars/viseron-a-modular-self-hosted-ai-video-surveillance-platform/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/viseron-a-modular-self-hosted-ai-video-surveillance-platform/</guid><description>Viseron is a self-hosted, local-only AI NVR platform in Python with modular AI features for privacy-focused video surveillance. Runs fully locally with Docker deployment.</description></item><item><title>Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single images</title><link>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</guid><description>Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.</description></item><item><title>NAS3R: Self-supervised 3D reconstruction and camera pose estimation with Gaussian splatting</title><link>https://ramdi.fr/github-stars/nas3r-self-supervised-3d-reconstruction-and-camera-pose-estimation-with-gaussian-splatting/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/nas3r-self-supervised-3d-reconstruction-and-camera-pose-estimation-with-gaussian-splatting/</guid><description>NAS3R enables self-supervised 3D geometry and camera parameter estimation without ground-truth data, using Gaussian splatting and a VGGT backbone. It supports multi-view setups and optional pretrained initialization.</description></item><item><title>NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images</title><link>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</guid><description>NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.</description></item><item><title>OmniStream: a multi-frame transformer for continuous video stream perception</title><link>https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/</guid><description>OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.</description></item><item><title>PromptHMR: integrating promptable architecture for 3D human mesh recovery from monocular inputs</title><link>https://ramdi.fr/github-stars/prompthmr-integrating-promptable-architecture-for-3d-human-mesh-recovery-from-monocular-inputs/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/prompthmr-integrating-promptable-architecture-for-3d-human-mesh-recovery-from-monocular-inputs/</guid><description>PromptHMR adapts SAM&amp;rsquo;s promptable design to 3D human mesh recovery, integrating SLAM, pose detection, and SMPL models into a unified pipeline for monocular images and videos.</description></item><item><title>SceneMaker: a decoupled framework for 3D scene generation with de-occlusion</title><link>https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/</guid><description>SceneMaker separates de-occlusion from 3D object generation to handle occluded open-set scenes. It uses FLUX Kontext and Step1X-3D, with code and checkpoints available.</description></item><item><title>SimRecon: compositional 3D scene reconstruction with viewpoint optimization and semantic graph synthesis</title><link>https://ramdi.fr/github-stars/simrecon-compositional-3d-scene-reconstruction-with-viewpoint-optimization-and-semantic-graph-synthesis/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/simrecon-compositional-3d-scene-reconstruction-with-viewpoint-optimization-and-semantic-graph-synthesis/</guid><description>SimRecon converts real-world videos into simulation-ready 3D scenes by combining geometry reconstruction, instance segmentation, viewpoint optimization, and semantic scene graph synthesis.</description></item><item><title>face_recognition: easy deep learning face recognition in Python with dlib</title><link>https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/</guid><description>face_recognition provides a simple Python API and CLI for highly accurate face detection and recognition using dlib&amp;rsquo;s deep learning model. It supports facial landmarks and multi-core processing.</description></item><item><title>Deep-Live-Cam: Real-time face swapping optimized across diverse hardware with ONNX Runtime</title><link>https://ramdi.fr/github-stars/deep-live-cam-real-time-face-swapping-optimized-across-diverse-hardware-with-onnx-runtime/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/deep-live-cam-real-time-face-swapping-optimized-across-diverse-hardware-with-onnx-runtime/</guid><description>Deep-Live-Cam offers real-time face swapping and deepfake video generation using ONNX Runtime with multiple execution providers for optimized performance on GPUs, CPUs, and Apple Silicon.</description></item><item><title>Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision model</title><link>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</guid><description>YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.</description></item></channel></rss>