Pytorch on Noureddine RAMDI

Pytorch on Noureddine RAMDIhttps://ramdi.fr/tags/pytorch/Recent content in Pytorch on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000APISR: a Python toolkit for AI-based image and video super-resolution with practical inference modeshttps://ramdi.fr/github-stars/apisr-a-python-toolkit-for-ai-based-image-and-video-super-resolution-with-practical-inference-modes/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/apisr-a-python-toolkit-for-ai-based-image-and-video-super-resolution-with-practical-inference-modes/APISR is a Python repo for AI-powered image and video super-resolution, offering fast Gradio inference and full-featured regular inference with dataset curation tools.DeepSpeed: scalable deep learning optimization with extensible hardware supporthttps://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.DualSDF: A two-level signed distance function approach for semantic 3D shape manipulationhttps://ramdi.fr/github-stars/dualsdf-a-two-level-signed-distance-function-approach-for-semantic-3d-shape-manipulation/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/dualsdf-a-two-level-signed-distance-function-approach-for-semantic-3d-shape-manipulation/DualSDF separates coarse semantic structure from fine geometric detail in 3D shape modeling using a two-level signed distance function. It enables intuitive shape edits with pretrained models and a WebGL demo.Fast3R: scalable multi-view 3D reconstruction with a single forward passhttps://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.Hivemind: decentralized peer-to-peer deep learning with PyTorchhttps://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/Hivemind is a PyTorch library enabling decentralized deep learning over the internet using a peer-to-peer Distributed Hash Table (DHT). It supports fault-tolerant training and decentralized parameter averaging without global sync.MASt3R-SLAM: integrating foundation-model 3D priors into real-time dense SLAMhttps://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/MASt3R-SLAM integrates a pretrained 3D reconstruction model as a geometry prior in a dense SLAM pipeline, enabling real-time tracking and mapping without classical bundle adjustment or depth sensors.OmniGen2: a unified multimodal generation model with separate decoding paths for text and imageshttps://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.PartCrafter: compositional 3D mesh generation with latent diffusion transformershttps://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/PartCrafter generates multiple semantically distinct 3D mesh parts from a single RGB image using latent diffusion transformers, enabling structured 3D generation with pretrained models and VLM-based part suggestions.SVFR: unified video face restoration with task-conditioned stable video diffusionhttps://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.CodeFormer: Deep learning-based blind face restoration with fidelity controlhttps://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.AniGen: GPU-accelerated 3D animation generation with Python and CUDAhttps://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.ComfyUI Trellis2: Extending ComfyUI with Dinov3 for 3D-Aware Diffusion Workflowshttps://ramdi.fr/github-stars/comfyui-trellis2-extending-comfyui-with-dinov3-for-3d-aware-diffusion-workflows/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/comfyui-trellis2-extending-comfyui-with-dinov3-for-3d-aware-diffusion-workflows/ComfyUI-Trellis2 integrates facebook’s Dinov3 model into ComfyUI for advanced 3D-aware diffusion workflows. This article breaks down its architecture, strengths, and installation steps.DIMO: Distilling Diverse 3D Motion Priors for Arbitrary Object Motion Synthesishttps://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.DROID-W: extending SLAM to dynamic, in-the-wild scenes with uncertainty estimationhttps://ramdi.fr/github-stars/droid-w-extending-slam-to-dynamic-in-the-wild-scenes-with-uncertainty-estimation/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/droid-w-extending-slam-to-dynamic-in-the-wild-scenes-with-uncertainty-estimation/DROID-W builds on DROID-SLAM to handle dynamic scenes in-the-wild by jointly estimating camera pose, scene structure, and dynamic uncertainty using Lie group optimization and metric depth estimation.Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCRhttps://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusionhttps://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.PEAR: real-time expressive 3D human mesh recovery at 100 FPShttps://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/PEAR predicts expressive 3D human mesh parameters for body, hands, and face simultaneously at 100 FPS using a pixel-aligned architecture based on PyTorch and SMPL-X models.Streaming 3D scene reconstruction with LingBot-Map’s geometric context transformerhttps://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/LingBot-Map performs streaming 3D reconstruction from long image sequences at ~20 FPS using a geometric context transformer and paged KV cache attention for efficient memory management.tribev2: pretrained models for predicting brain responses to videoshttps://ramdi.fr/github-stars/tribev2-pretrained-models-for-predicting-brain-responses-to-videos/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/tribev2-pretrained-models-for-predicting-brain-responses-to-videos/tribev2 offers pretrained models to predict brain responses to videos using cortical mesh modeling. Supports video, text, and audio inputs with easy inference setup.In-Place TTT: Adaptive test-time training for transformer LLMs with in-place fast-weight updateshttps://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/ByteDance’s In-Place TTT enables adaptive transformer inference by updating MLP down-projection weights in-place at test time, supporting long-context reasoning without extra modules.NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view imageshttps://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.OmniStream: a multi-frame transformer for continuous video stream perceptionhttps://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.annotated_deep_learning_paper_implementations: annotated PyTorch implementations of key deep learning papershttps://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.LlamaFactory: modular, extensible fine-tuning framework for large language modelshttps://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.ComfyUI: modular visual workflows for diffusion model experimentationhttps://ramdi.fr/github-stars/comfyui-modular-visual-workflows-for-diffusion-model-experimentation/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/comfyui-modular-visual-workflows-for-diffusion-model-experimentation/ComfyUI offers a graph/node interface for building complex diffusion model workflows offline, blending modularity with flexibility for AI practitioners.PyTorch's dynamic neural networks and tape-based autograd: a deep dive into flexible deep learninghttps://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/Sun, 26 Apr 2026 17:51:11 +0000https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/Explore PyTorch’s unique tape-based autograd and dynamic neural networks architecture that enables flexible model development and efficient GPU-accelerated tensor computation.Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision modelhttps://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorchhttps://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, accelerated deep learning model development with up to 350% speedups.