Deep-Learning on Noureddine RAMDI

Deep-Learning on Noureddine RAMDIhttps://ramdi.fr/tags/deep-learning/Recent content in Deep-Learning on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000ALICE: a self-contained YOLO dataset management toolkit with a creative single-file Python builderhttps://ramdi.fr/github-stars/alice-a-self-contained-yolo-dataset-management-toolkit-with-a-creative-single-file-python-builder/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/alice-a-self-contained-yolo-dataset-management-toolkit-with-a-creative-single-file-python-builder/ALICE is a Python-based toolkit for managing YOLO training datasets from home camera setups, featuring a unique single-file builder and seamless Frigate NVR integration.DeepSpeed: scalable deep learning optimization with extensible hardware supporthttps://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.DiT4DiT: Vision-Action Modeling with Video Transformers for Real-Time Humanoid Robot Controlhttps://ramdi.fr/github-stars/dit4dit-vision-action-modeling-with-video-transformers-for-real-time-humanoid-robot-control/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/dit4dit-vision-action-modeling-with-video-transformers-for-real-time-humanoid-robot-control/DiT4DiT uses a frozen Cosmos-Predict2.5 video transformer backbone combined with flow-matching action heads to model robot actions as video latent transitions, achieving near-perfect success on LIBERO and real-time humanoid control.Hivemind: decentralized peer-to-peer deep learning with PyTorchhttps://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/Hivemind is a PyTorch library enabling decentralized deep learning over the internet using a peer-to-peer Distributed Hash Table (DHT). It supports fault-tolerant training and decentralized parameter averaging without global sync.ML-From-Scratch: Exploring Machine Learning Fundamentals with Pure Python and NumPyhttps://ramdi.fr/github-stars/ml-from-scratch-exploring-machine-learning-fundamentals-with-pure-python-and-numpy/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/ml-from-scratch-exploring-machine-learning-fundamentals-with-pure-python-and-numpy/ML-From-Scratch offers bare-bones Python implementations of key machine learning algorithms using only NumPy, focusing on transparency over efficiency. Explore how it demystifies ML fundamentals.OmniGen2: a unified multimodal generation model with separate decoding paths for text and imageshttps://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.OverlapNet: Siamese networks for loop closure detection in 3D LiDAR SLAMhttps://ramdi.fr/github-stars/overlapnet-siamese-networks-for-loop-closure-detection-in-3d-lidar-slam/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/overlapnet-siamese-networks-for-loop-closure-detection-in-3d-lidar-slam/OverlapNet uses Siamese networks on 2D range images from 3D LiDAR to detect loop closures by predicting overlap and relative yaw angle simultaneously. Practical demos included.Pixal3D: pixel-aligned 3D asset generation from a single image with projection conditioninghttps://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/pixal3d-pixel-aligned-3d-asset-generation-from-a-single-image-with-projection-conditioning/Pixal3D generates high-fidelity 3D assets with PBR textures from a single image using pixel-aligned projection conditioning. It offers a three-stage cascade and low-VRAM mode for consumer GPUs.SAM3-UNet: Adapting Meta's SAM3 for efficient dense prediction with a lightweight U-Net decoderhttps://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/sam3-unet-adapting-meta-s-sam3-for-efficient-dense-prediction-with-a-lightweight-u-net-decoder/SAM3-UNet adapts Meta’s SAM3 foundation model for dense prediction tasks using a parameter-efficient adapter and U-Net decoder, enabling training under 6 GB GPU memory.SVFR: unified video face restoration with task-conditioned stable video diffusionhttps://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.Tencent HY-World 2.0: multi-modal pipeline for persistent, editable 3D world generationhttps://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/tencent-hy-world-2-0-multi-modal-pipeline-for-persistent-editable-3d-world-generation/Tencent’s HY-World 2.0 generates persistent 3D assets from text, images, or video using a four-stage pipeline. It outputs editable worlds compatible with Blender, Unity, and Unreal Engine.Tracing deep learning step-by-step in Excel: a hands-on guide to ai-by-hand-excelhttps://ramdi.fr/github-stars/tracing-deep-learning-step-by-step-in-excel-a-hands-on-guide-to-ai-by-hand-excel/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/tracing-deep-learning-step-by-step-in-excel-a-hands-on-guide-to-ai-by-hand-excel/Explore how ai-by-hand-excel implements deep learning architectures like Transformers entirely in Excel formulas, exposing the math behind AI step-by-step without code.CodeFormer: Deep learning-based blind face restoration with fidelity controlhttps://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.Medical-SAM3: adapting foundation models for prompt-driven medical image segmentationhttps://ramdi.fr/github-stars/medical-sam3-adapting-foundation-models-for-prompt-driven-medical-image-segmentation/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/medical-sam3-adapting-foundation-models-for-prompt-driven-medical-image-segmentation/Medical-SAM3 adapts the SAM3 foundation model for universal prompt-driven medical image segmentation, offering pretrained weights and evaluation tools on diverse medical datasets.A curated 100-day machine learning journey with code and resourceshttps://ramdi.fr/github-stars/a-curated-100-day-machine-learning-journey-with-code-and-resources/Mon, 04 May 2026 10:23:03 +0000https://ramdi.fr/github-stars/a-curated-100-day-machine-learning-journey-with-code-and-resources/Explore a 100-day machine learning coding challenge combining classical algorithms, deep learning, and curated resources. A practical, day-by-day learning path for self-directed devs.AniGen: GPU-accelerated 3D animation generation with Python and CUDAhttps://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.Awesome-Deblurring: A comprehensive academic resource on image and video deblurring techniqueshttps://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/awesome-deblurring-a-comprehensive-academic-resource-on-image-and-video-deblurring-techniques/Awesome-Deblurring compiles 100+ key papers tracing image and video deblurring from classical optimization to modern deep learning, serving as a go-to bibliography for researchers and developers.DIMO: Distilling Diverse 3D Motion Priors for Arbitrary Object Motion Synthesishttps://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.Magika: Google's deep learning system for fast, accurate file type detectionhttps://ramdi.fr/github-stars/magika-google-s-deep-learning-system-for-fast-accurate-file-type-detection/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/magika-google-s-deep-learning-system-for-fast-accurate-file-type-detection/Magika replaces magic-byte heuristics with a tiny deep learning model for file type detection, achieving ~99% accuracy across 200+ types with 5ms CPU inference.Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusionhttps://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.Understanding LLM internals: a hands-on guide to transformers and attention mathhttps://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/understanding-llm-internals-a-hands-on-guide-to-transformers-and-attention-math/A curated repo breaking down large language model internals with numeric attention math, tokenization, and transformer architecture, targeting engineers who want to understand LLMs under the hood.Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single imageshttps://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.deepseek_ocr_app: full-stack OCR with multi-format PDF export and real-time progresshttps://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/deepseek_ocr_app combines React and FastAPI to offer powerful OCR for images and multipage PDFs with exports to Markdown, HTML, DOCX, and JSON. It features real-time progress tracking and bounding box visualization.FinRL: open-source framework for financial reinforcement learning with a train-test-trade pipelinehttps://ramdi.fr/github-stars/finrl-open-source-framework-for-financial-reinforcement-learning-with-a-train-test-trade-pipeline/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/finrl-open-source-framework-for-financial-reinforcement-learning-with-a-train-test-trade-pipeline/FinRL provides an open-source three-layer architecture for financial reinforcement learning with 5 DRL agents and 14+ data sources. Great for learning DRL in finance.NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view imageshttps://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.OpenMythos: Exploring recurrent-depth transformers with input injection for sustained reasoninghttps://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/openmythos-exploring-recurrent-depth-transformers-with-input-injection-for-sustained-reasoning/OpenMythos implements a recurrent-depth transformer that recycles layers via looped blocks, using input injection to prevent signal drift. It scales from 1B to 1T parameters with up to 1M token context.SceneMaker: a decoupled framework for 3D scene generation with de-occlusionhttps://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/scenemaker-a-decoupled-framework-for-3d-scene-generation-with-de-occlusion/SceneMaker separates de-occlusion from 3D object generation to handle occluded open-set scenes. It uses FLUX Kontext and Step1X-3D, with code and checkpoints available.Tencent Hunyuan3D-Part: a two-stage pipeline for semantic 3D mesh part segmentation and generationhttps://ramdi.fr/github-stars/tencent-hunyuan3d-part-a-two-stage-pipeline-for-semantic-3d-mesh-part-segmentation-and-generation/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/tencent-hunyuan3d-part-a-two-stage-pipeline-for-semantic-3d-mesh-part-segmentation-and-generation/Tencent’s Hunyuan3D-Part offers a two-model pipeline for 3D mesh part segmentation with P3-SAM and high-fidelity part generation via X-Part, targeting semantic mesh decomposition.annotated_deep_learning_paper_implementations: annotated PyTorch implementations of key deep learning papershttps://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.face_recognition: easy deep learning face recognition in Python with dlibhttps://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/face-recognition-easy-deep-learning-face-recognition-in-python-with-dlib/face_recognition provides a simple Python API and CLI for highly accurate face detection and recognition using dlib’s deep learning model. It supports facial landmarks and multi-core processing.Dive into Deep Learning (D2L.ai) Chinese Edition: An interactive textbook bridging theory and codehttps://ramdi.fr/github-stars/dive-into-deep-learning-d2l-ai-chinese-edition-an-interactive-textbook-bridging-theory-and-code/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/dive-into-deep-learning-d2l-ai-chinese-edition-an-interactive-textbook-bridging-theory-and-code/Dive into Deep Learning Chinese edition offers an interactive, code-driven deep learning textbook in Python, integrating theory with runnable examples for hands-on learning.PyTorch's dynamic neural networks and tape-based autograd: a deep dive into flexible deep learninghttps://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/Explore PyTorch’s unique tape-based autograd and dynamic neural networks architecture that enables flexible model development and efficient GPU-accelerated tensor computation.TensorFlow: a versatile platform powering machine learning from research to productionhttps://ramdi.fr/github-stars/tensorflow-a-versatile-platform-powering-machine-learning-from-research-to-production/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/tensorflow-a-versatile-platform-powering-machine-learning-from-research-to-production/TensorFlow is a comprehensive open-source machine learning platform with stable multi-language APIs and broad hardware support, evolving from research prototype to production-ready ecosystem.Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision modelhttps://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorchhttps://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, accelerated deep learning model development with up to 350% speedups.