<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Deep-Learning on Noureddine RAMDI</title><link>https://ramdi.fr/tags/deep-learning/</link><description>Recent content in Deep-Learning on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/deep-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>ALICE: a self-contained YOLO dataset management toolkit with a creative single-file Python builder</title><link>https://ramdi.fr/github-stars/alice-a-self-contained-yolo-dataset-management-toolkit-with-a-creative-single-file-python-builder/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/alice-a-self-contained-yolo-dataset-management-toolkit-with-a-creative-single-file-python-builder/</guid><description>ALICE is a Python-based toolkit for managing YOLO training datasets from home camera setups, featuring a unique single-file builder and seamless Frigate NVR integration.</description></item><item><title>DeepSpeed: scalable deep learning optimization with extensible hardware support</title><link>https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/</guid><description>DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.</description></item><item><title>DiT4DiT: Vision-Action Modeling with Video Transformers for Real-Time Humanoid Robot Control</title><link>https://ramdi.fr/github-stars/dit4dit-vision-action-modeling-with-video-transformers-for-real-time-humanoid-robot-control/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/dit4dit-vision-action-modeling-with-video-transformers-for-real-time-humanoid-robot-control/</guid><description>DiT4DiT uses a frozen Cosmos-Predict2.5 video transformer backbone combined with flow-matching action heads to model robot actions as video latent transitions, achieving near-perfect success on LIBERO and real-time humanoid control.</description></item><item><title>Hivemind: decentralized peer-to-peer deep learning with PyTorch</title><link>https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/</guid><description>Hivemind is a PyTorch library enabling decentralized deep learning over the internet using a peer-to-peer Distributed Hash Table (DHT). It supports fault-tolerant training and decentralized parameter averaging without global sync.</description></item><item><title>ML-From-Scratch: Exploring Machine Learning Fundamentals with Pure Python and NumPy</title><link>https://ramdi.fr/github-stars/ml-from-scratch-exploring-machine-learning-fundamentals-with-pure-python-and-numpy/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/ml-from-scratch-exploring-machine-learning-fundamentals-with-pure-python-and-numpy/</guid><description>ML-From-Scratch offers bare-bones Python implementations of key machine learning algorithms using only NumPy, focusing on transparency over efficiency. Explore how it demystifies ML fundamentals.</description></item><item><title>OmniGen2: a unified multimodal generation model with separate decoding paths for text and images</title><link>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</guid><description>OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.</description></item><item><title>OverlapNet: Siamese networks for loop closure detection in 3D LiDAR SLAM</title><link>https://ramdi.fr/github-stars/overlapnet-siamese-networks-for-loop-closure-detection-in-3d-lidar-slam/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/overlapnet-siamese-networks-for-loop-closure-detection-in-3d-lidar-slam/</guid><description>OverlapNet uses Siamese networks on 2D range images from 3D LiDAR to detect loop closures by predicting overlap and relative yaw angle simultaneously. Practical demos included.</description></item><item><title>Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioning</title><link>https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/</guid><description>Pixal3D generates high-fidelity 3D assets with PBR textures from a single image using pixel-aligned projection conditioning. It offers a three-stage cascade and low-VRAM mode for consumer GPUs.</description></item><item><title>SAM3-UNet: Adapting Meta's SAM3 for efficient dense prediction with a lightweight U-Net decoder</title><link>https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/</guid><description>SAM3-UNet adapts Meta&amp;rsquo;s SAM3 foundation model for dense prediction tasks using a parameter-efficient adapter and U-Net decoder, enabling training under 6 GB GPU memory.</description></item><item><title>SVFR: unified video face restoration with task-conditioned stable video diffusion</title><link>https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/</guid><description>SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.</description></item><item><title>Tencent HY-World 2.0: multi-modal pipeline for persistent, editable 3D world generation</title><link>https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/</guid><description>Tencent&amp;rsquo;s HY-World 2.0 generates persistent 3D assets from text, images, or video using a four-stage pipeline. It outputs editable worlds compatible with Blender, Unity, and Unreal Engine.</description></item><item><title>Tracing deep learning step-by-step in Excel: a hands-on guide to ai-by-hand-excel</title><link>https://ramdi.fr/github-stars/tracing-deep-learning-step-by-step-in-excel-a-hands-on-guide-to-ai-by-hand-excel/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/tracing-deep-learning-step-by-step-in-excel-a-hands-on-guide-to-ai-by-hand-excel/</guid><description>Explore how ai-by-hand-excel implements deep learning architectures like Transformers entirely in Excel formulas, exposing the math behind AI step-by-step without code.</description></item><item><title>CodeFormer: Deep learning-based blind face restoration with fidelity control</title><link>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</guid><description>CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.</description></item><item><title>Medical-SAM3: adapting foundation models for prompt-driven medical image segmentation</title><link>https://ramdi.fr/github-stars/medical-sam3-adapting-foundation-models-for-prompt-driven-medical-image-segmentation/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/medical-sam3-adapting-foundation-models-for-prompt-driven-medical-image-segmentation/</guid><description>Medical-SAM3 adapts the SAM3 foundation model for universal prompt-driven medical image segmentation, offering pretrained weights and evaluation tools on diverse medical datasets.</description></item><item><title>A curated 100-day machine learning journey with code and resources</title><link>https://ramdi.fr/github-stars/a-curated-100-day-machine-learning-journey-with-code-and-resources/</link><pubDate>Mon, 04 May 2026 10:23:03 +0000</pubDate><guid>https://ramdi.fr/github-stars/a-curated-100-day-machine-learning-journey-with-code-and-resources/</guid><description>Explore a 100-day machine learning coding challenge combining classical algorithms, deep learning, and curated resources. A practical, day-by-day learning path for self-directed devs.</description></item><item><title>AniGen: GPU-accelerated 3D animation generation with Python and CUDA</title><link>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</guid><description>AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.</description></item><item><title>Awesome-Deblurring: A comprehensive academic resource on image and video deblurring techniques</title><link>https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/</guid><description>Awesome-Deblurring compiles 100+ key papers tracing image and video deblurring from classical optimization to modern deep learning, serving as a go-to bibliography for researchers and developers.</description></item><item><title>DIMO: Distilling Diverse 3D Motion Priors for Arbitrary Object Motion Synthesis</title><link>https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/</guid><description>DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.</description></item><item><title>Magika: Google's deep learning system for fast, accurate file type detection</title><link>https://ramdi.fr/github-stars/magika-google-s-deep-learning-system-for-fast-accurate-file-type-detection/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/magika-google-s-deep-learning-system-for-fast-accurate-file-type-detection/</guid><description>Magika replaces magic-byte heuristics with a tiny deep learning model for file type detection, achieving ~99% accuracy across 200+ types with 5ms CPU inference.</description></item><item><title>Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusion</title><link>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</guid><description>Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.</description></item><item><title>Understanding LLM internals: a hands-on guide to transformers and attention math</title><link>https://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/</guid><description>A curated repo breaking down large language model internals with numeric attention math, tokenization, and transformer architecture, targeting engineers who want to understand LLMs under the hood.</description></item><item><title>Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single images</title><link>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</guid><description>Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.</description></item><item><title>deepseek_ocr_app: full-stack OCR with multi-format PDF export and real-time progress</title><link>https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/</guid><description>deepseek_ocr_app combines React and FastAPI to offer powerful OCR for images and multipage PDFs with exports to Markdown, HTML, DOCX, and JSON. It features real-time progress tracking and bounding box visualization.</description></item><item><title>FinRL: open-source framework for financial reinforcement learning with a train-test-trade pipeline</title><link>https://ramdi.fr/github-stars/finrl-open-source-framework-for-financial-reinforcement-learning-with-a-train-test-trade-pipeline/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/finrl-open-source-framework-for-financial-reinforcement-learning-with-a-train-test-trade-pipeline/</guid><description>FinRL provides an open-source three-layer architecture for financial reinforcement learning with 5 DRL agents and 14+ data sources. Great for learning DRL in finance.</description></item><item><title>NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images</title><link>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</guid><description>NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.</description></item><item><title>OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoning</title><link>https://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/</guid><description>OpenMythos implements a recurrent-depth transformer that recycles layers via looped blocks, using input injection to prevent signal drift. It scales from 1B to 1T parameters with up to 1M token context.</description></item><item><title>SceneMaker: a decoupled framework for 3D scene generation with de-occlusion</title><link>https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/</guid><description>SceneMaker separates de-occlusion from 3D object generation to handle occluded open-set scenes. It uses FLUX Kontext and Step1X-3D, with code and checkpoints available.</description></item><item><title>Tencent Hunyuan3D-Part: a two-stage pipeline for semantic 3D mesh part segmentation and generation</title><link>https://ramdi.fr/github-stars/tencent-hunyuan3d-part-a-two-stage-pipeline-for-semantic-3d-mesh-part-segmentation-and-generation/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/tencent-hunyuan3d-part-a-two-stage-pipeline-for-semantic-3d-mesh-part-segmentation-and-generation/</guid><description>Tencent&amp;rsquo;s Hunyuan3D-Part offers a two-model pipeline for 3D mesh part segmentation with P3-SAM and high-fidelity part generation via X-Part, targeting semantic mesh decomposition.</description></item><item><title>annotated_deep_learning_paper_implementations: annotated PyTorch implementations of key deep learning papers</title><link>https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/</guid><description>This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.</description></item><item><title>face_recognition: easy deep learning face recognition in Python with dlib</title><link>https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/</guid><description>face_recognition provides a simple Python API and CLI for highly accurate face detection and recognition using dlib&amp;rsquo;s deep learning model. It supports facial landmarks and multi-core processing.</description></item><item><title>Dive into Deep Learning (D2L.ai) Chinese Edition: An interactive textbook bridging theory and code</title><link>https://ramdi.fr/github-stars/dive-into-deep-learning-d2l-ai-chinese-edition-an-interactive-textbook-bridging-theory-and-code/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/dive-into-deep-learning-d2l-ai-chinese-edition-an-interactive-textbook-bridging-theory-and-code/</guid><description>Dive into Deep Learning Chinese edition offers an interactive, code-driven deep learning textbook in Python, integrating theory with runnable examples for hands-on learning.</description></item><item><title>PyTorch's dynamic neural networks and tape-based autograd: a deep dive into flexible deep learning</title><link>https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/</guid><description>Explore PyTorch&amp;rsquo;s unique tape-based autograd and dynamic neural networks architecture that enables flexible model development and efficient GPU-accelerated tensor computation.</description></item><item><title>TensorFlow: a versatile platform powering machine learning from research to production</title><link>https://ramdi.fr/github-stars/tensorflow-a-versatile-platform-powering-machine-learning-from-research-to-production/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/tensorflow-a-versatile-platform-powering-machine-learning-from-research-to-production/</guid><description>TensorFlow is a comprehensive open-source machine learning platform with stable multi-language APIs and broad hardware support, evolving from research prototype to production-ready ecosystem.</description></item><item><title>Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision model</title><link>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</guid><description>YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.</description></item><item><title>Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorch</title><link>https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/</guid><description>Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, accelerated deep learning model development with up to 350% speedups.</description></item></channel></rss>