<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pytorch on Noureddine RAMDI</title><link>https://ramdi.fr/tags/pytorch/</link><description>Recent content in Pytorch on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/pytorch/index.xml" rel="self" type="application/rss+xml"/><item><title>APISR: a Python toolkit for AI-based image and video super-resolution with practical inference modes</title><link>https://ramdi.fr/github-stars/apisr-a-python-toolkit-for-ai-based-image-and-video-super-resolution-with-practical-inference-modes/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/apisr-a-python-toolkit-for-ai-based-image-and-video-super-resolution-with-practical-inference-modes/</guid><description>APISR is a Python repo for AI-powered image and video super-resolution, offering fast Gradio inference and full-featured regular inference with dataset curation tools.</description></item><item><title>DeepSpeed: scalable deep learning optimization with extensible hardware support</title><link>https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/deepspeed-scalable-deep-learning-optimization-with-extensible-hardware-support/</guid><description>DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.</description></item><item><title>DualSDF: A two-level signed distance function approach for semantic 3D shape manipulation</title><link>https://ramdi.fr/github-stars/dualsdf-a-two-level-signed-distance-function-approach-for-semantic-3d-shape-manipulation/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/dualsdf-a-two-level-signed-distance-function-approach-for-semantic-3d-shape-manipulation/</guid><description>DualSDF separates coarse semantic structure from fine geometric detail in 3D shape modeling using a two-level signed distance function. It enables intuitive shape edits with pretrained models and a WebGL demo.</description></item><item><title>Fast3R: scalable multi-view 3D reconstruction with a single forward pass</title><link>https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/fast3r-scalable-multi-view-3d-reconstruction-with-a-single-forward-pass/</guid><description>Fast3R from Meta FAIR processes 1000+ unordered images simultaneously for 3D reconstruction using a ViT-Large backbone and multi-view attention, eliminating iterative matching.</description></item><item><title>Hivemind: decentralized peer-to-peer deep learning with PyTorch</title><link>https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/hivemind-decentralized-peer-to-peer-deep-learning-with-pytorch/</guid><description>Hivemind is a PyTorch library enabling decentralized deep learning over the internet using a peer-to-peer Distributed Hash Table (DHT). It supports fault-tolerant training and decentralized parameter averaging without global sync.</description></item><item><title>MASt3R-SLAM: integrating foundation-model 3D priors into real-time dense SLAM</title><link>https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/mast3r-slam-integrating-foundation-model-3d-priors-into-real-time-dense-slam/</guid><description>MASt3R-SLAM integrates a pretrained 3D reconstruction model as a geometry prior in a dense SLAM pipeline, enabling real-time tracking and mapping without classical bundle adjustment or depth sensors.</description></item><item><title>OmniGen2: a unified multimodal generation model with separate decoding paths for text and images</title><link>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</guid><description>OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.</description></item><item><title>PartCrafter: compositional 3D mesh generation with latent diffusion transformers</title><link>https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/partcrafter-compositional-3d-mesh-generation-with-latent-diffusion-transformers/</guid><description>PartCrafter generates multiple semantically distinct 3D mesh parts from a single RGB image using latent diffusion transformers, enabling structured 3D generation with pretrained models and VLM-based part suggestions.</description></item><item><title>SVFR: unified video face restoration with task-conditioned stable video diffusion</title><link>https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/svfr-unified-video-face-restoration-with-task-conditioned-stable-video-diffusion/</guid><description>SVFR combines blind face restoration, colorization, and inpainting in a single stable video diffusion model, enabling efficient multi-task video face enhancement.</description></item><item><title>CodeFormer: Deep learning-based blind face restoration with fidelity control</title><link>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/codeformer-deep-learning-based-blind-face-restoration-with-fidelity-control/</guid><description>CodeFormer uses a codebook transformer architecture for blind face restoration, letting users control the tradeoff between quality and fidelity with a unique fidelity weight parameter.</description></item><item><title>AniGen: GPU-accelerated 3D animation generation with Python and CUDA</title><link>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</guid><description>AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.</description></item><item><title>ComfyUI Trellis2: Extending ComfyUI with Dinov3 for 3D-Aware Diffusion Workflows</title><link>https://ramdi.fr/github-stars/comfyui-trellis2-extending-comfyui-with-dinov3-for-3d-aware-diffusion-workflows/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/comfyui-trellis2-extending-comfyui-with-dinov3-for-3d-aware-diffusion-workflows/</guid><description>ComfyUI-Trellis2 integrates facebook&amp;rsquo;s Dinov3 model into ComfyUI for advanced 3D-aware diffusion workflows. This article breaks down its architecture, strengths, and installation steps.</description></item><item><title>DIMO: Distilling Diverse 3D Motion Priors for Arbitrary Object Motion Synthesis</title><link>https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/dimo-distilling-diverse-3d-motion-priors-for-arbitrary-object-motion-synthesis/</guid><description>DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.</description></item><item><title>DROID-W: extending SLAM to dynamic, in-the-wild scenes with uncertainty estimation</title><link>https://ramdi.fr/github-stars/droid-w-extending-slam-to-dynamic-in-the-wild-scenes-with-uncertainty-estimation/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/droid-w-extending-slam-to-dynamic-in-the-wild-scenes-with-uncertainty-estimation/</guid><description>DROID-W builds on DROID-SLAM to handle dynamic scenes in-the-wild by jointly estimating camera pose, scene structure, and dynamic uncertainty using Lie group optimization and metric depth estimation.</description></item><item><title>Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR</title><link>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</guid><description>Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.</description></item><item><title>Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusion</title><link>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</guid><description>Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.</description></item><item><title>PEAR: real-time expressive 3D human mesh recovery at 100 FPS</title><link>https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/pear-real-time-expressive-3d-human-mesh-recovery-at-100-fps/</guid><description>PEAR predicts expressive 3D human mesh parameters for body, hands, and face simultaneously at 100 FPS using a pixel-aligned architecture based on PyTorch and SMPL-X models.</description></item><item><title>Streaming 3D scene reconstruction with LingBot-Map’s geometric context transformer</title><link>https://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/streaming-3d-scene-reconstruction-with-lingbot-maps-geometric-context-transformer/</guid><description>LingBot-Map performs streaming 3D reconstruction from long image sequences at ~20 FPS using a geometric context transformer and paged KV cache attention for efficient memory management.</description></item><item><title>tribev2: pretrained models for predicting brain responses to videos</title><link>https://ramdi.fr/github-stars/tribev2-pretrained-models-for-predicting-brain-responses-to-videos/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/tribev2-pretrained-models-for-predicting-brain-responses-to-videos/</guid><description>tribev2 offers pretrained models to predict brain responses to videos using cortical mesh modeling. Supports video, text, and audio inputs with easy inference setup.</description></item><item><title>In-Place TTT: Adaptive test-time training for transformer LLMs with in-place fast-weight updates</title><link>https://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/in-place-ttt-adaptive-test-time-training-for-transformer-llms-with-in-place-fast-weight-updates/</guid><description>ByteDance&amp;rsquo;s In-Place TTT enables adaptive transformer inference by updating MLP down-projection weights in-place at test time, supporting long-context reasoning without extra modules.</description></item><item><title>NOVA3R: Non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images</title><link>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/nova3r-non-pixel-aligned-visual-transformer-for-amodal-3d-reconstruction-from-unposed-multi-view-images/</guid><description>NOVA3R implements a non-pixel-aligned visual transformer for amodal 3D reconstruction from unposed multi-view images, recovering occluded geometry with physical plausibility.</description></item><item><title>OmniStream: a multi-frame transformer for continuous video stream perception</title><link>https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/omnistream-a-multi-frame-transformer-for-continuous-video-stream-perception/</guid><description>OmniStream uses a multi-frame transformer to process continuous video streams with patch-level temporal indexing, supporting downstream vision-language-action tasks.</description></item><item><title>annotated_deep_learning_paper_implementations: annotated PyTorch implementations of key deep learning papers</title><link>https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/annotated-deep-learning-paper-implementations-annotated-pytorch-implementations-of-key-deep-learning-papers/</guid><description>This repo provides annotated PyTorch implementations of major deep learning papers with side-by-side explanations, aiding understanding and prototyping.</description></item><item><title>LlamaFactory: modular, extensible fine-tuning framework for large language models</title><link>https://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/</guid><description>LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.</description></item><item><title>ComfyUI: modular visual workflows for diffusion model experimentation</title><link>https://ramdi.fr/github-stars/comfyui-modular-visual-workflows-for-diffusion-model-experimentation/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/comfyui-modular-visual-workflows-for-diffusion-model-experimentation/</guid><description>ComfyUI offers a graph/node interface for building complex diffusion model workflows offline, blending modularity with flexibility for AI practitioners.</description></item><item><title>PyTorch's dynamic neural networks and tape-based autograd: a deep dive into flexible deep learning</title><link>https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/</link><pubDate>Sun, 26 Apr 2026 17:51:11 +0000</pubDate><guid>https://ramdi.fr/github-stars/pytorch-s-dynamic-neural-networks-and-tape-based-autograd-a-deep-dive-into-flexible-deep-learning/</guid><description>Explore PyTorch&amp;rsquo;s unique tape-based autograd and dynamic neural networks architecture that enables flexible model development and efficient GPU-accelerated tensor computation.</description></item><item><title>Hands-on with YOLOv5: A practical deep dive into Ultralytics' PyTorch vision model</title><link>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/hands-on-with-yolov5-a-practical-deep-dive-into-ultralytics-pytorch-vision-model/</guid><description>YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detection, segmentation, and classification. Explore its architecture, strengths, and quickstart usage.</description></item><item><title>Keras 3: Multi-backend deep learning framework simplifying model development across JAX, TensorFlow, and PyTorch</title><link>https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/keras-3-multi-backend-deep-learning-framework-simplifying-model-development-across-jax-tensorflow-and-pytorch/</guid><description>Keras 3 introduces a multi-backend architecture supporting JAX, TensorFlow, PyTorch, and OpenVINO, enabling flexible, accelerated deep learning model development with up to 350% speedups.</description></item></channel></rss>