<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gpu on Noureddine RAMDI</title><link>https://ramdi.fr/tags/gpu/</link><description>Recent content in Gpu on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/gpu/index.xml" rel="self" type="application/rss+xml"/><item><title>A structured GPU performance engineering curriculum from fundamentals to frontier labs</title><link>https://ramdi.fr/github-stars/a-structured-gpu-performance-engineering-curriculum-from-fundamentals-to-frontier-labs/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/a-structured-gpu-performance-engineering-curriculum-from-fundamentals-to-frontier-labs/</guid><description>A curated GPU performance engineering curriculum focusing on CUDA, kernel optimization, and NVIDIA architectures, guiding engineers from fundamentals to advanced production techniques.</description></item><item><title>deck.gl-raster: GPU-accelerated client-side rendering of massive geospatial rasters</title><link>https://ramdi.fr/github-stars/deck-gl-raster-gpu-accelerated-client-side-rendering-of-massive-geospatial-rasters/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/deck-gl-raster-gpu-accelerated-client-side-rendering-of-massive-geospatial-rasters/</guid><description>deck.gl-raster streams and renders huge Cloud-Optimized GeoTIFFs entirely in-browser using WebGL2, avoiding servers and preprocessing. It enables fast, scalable geospatial visualization of raw raster data.</description></item><item><title>Inside Mini-SGLang: A clear and modular Python LLM inference engine</title><link>https://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/</guid><description>Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.</description></item><item><title>OCRFlux: GPU-Accelerated OCR with Python for High-Performance Document Processing</title><link>https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/</guid><description>OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.</description></item><item><title>SceneSmith: AI-driven pipeline for physics-ready 3D indoor scene generation from text</title><link>https://ramdi.fr/github-stars/scenesmith-ai-driven-pipeline-for-physics-ready-3d-indoor-scene-generation-from-text/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/scenesmith-ai-driven-pipeline-for-physics-ready-3d-indoor-scene-generation-from-text/</guid><description>SceneSmith uses GPT-5-powered agents to generate physically plausible 3D indoor scenes from text prompts, ready for robotics simulation without manual cleanup.</description></item><item><title>VisoMaster Fusion: a portable Windows app bundling multiple AI face-swapping models</title><link>https://ramdi.fr/github-stars/visomaster-fusion-a-portable-windows-app-bundling-multiple-ai-face-swapping-models/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/visomaster-fusion-a-portable-windows-app-bundling-multiple-ai-face-swapping-models/</guid><description>VisoMaster Fusion bundles over a dozen AI face-swapping models into a portable Windows desktop app with automatic runtime setup, simplifying the complex AI video editing workflow.</description></item><item><title>Repurposing the ASRock AMD BC-250: Community-driven firmware unlocking on PS5-derived silicon</title><link>https://ramdi.fr/github-stars/repurposing-the-asrock-amd-bc-250-community-driven-firmware-unlocking-on-ps5-derived-silicon/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/repurposing-the-asrock-amd-bc-250-community-driven-firmware-unlocking-on-ps5-derived-silicon/</guid><description>The ASRock AMD BC-250 mining board uses PS5-derived silicon with 6 Zen 2 cores and a 24CU RDNA2 GPU sharing 16GB GDDR6. This repo documents community firmware mods and Linux GPU support.</description></item><item><title>TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughput</title><link>https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/</guid><description>TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.</description></item><item><title>NVIDIA Warp: JIT-compiling Python for CUDA-powered differentiable physics</title><link>https://ramdi.fr/github-stars/nvidia-warp-jit-compiling-python-for-cuda-powered-differentiable-physics/</link><pubDate>Mon, 04 May 2026 10:23:03 +0000</pubDate><guid>https://ramdi.fr/github-stars/nvidia-warp-jit-compiling-python-for-cuda-powered-differentiable-physics/</guid><description>NVIDIA Warp lets you write Python functions JIT-compiled into CUDA kernels for GPU-accelerated differentiable physics and ML integration, simplifying GPU programming in Python.</description></item><item><title>AniGen: GPU-accelerated 3D animation generation with Python and CUDA</title><link>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/</guid><description>AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.</description></item><item><title>Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond</title><link>https://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/</guid><description>Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.</description></item><item><title>NVIDIA open GPU kernel modules: a pragmatic architecture for Linux GPU drivers</title><link>https://ramdi.fr/github-stars/nvidia-open-gpu-kernel-modules-a-pragmatic-architecture-for-linux-gpu-drivers/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/nvidia-open-gpu-kernel-modules-a-pragmatic-architecture-for-linux-gpu-drivers/</guid><description>NVIDIA&amp;rsquo;s open GPU kernel modules split driver code into pre-built OS-agnostic binaries and thin kernel interface layers, avoiding recompilation on Linux kernel updates. Here’s how it works.</description></item><item><title>Recreating the 3dfx Voodoo GPU in SpinalHDL for FPGA and cycle-accurate simulation</title><link>https://ramdi.fr/github-stars/recreating-the-3dfx-voodoo-gpu-in-spinalhdl-for-fpga-and-cycle-accurate-simulation/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/recreating-the-3dfx-voodoo-gpu-in-spinalhdl-for-fpga-and-cycle-accurate-simulation/</guid><description>SpinalVoodoo rebuilds the classic 3dfx Voodoo Graphics GPU in SpinalHDL, targeting FPGA synthesis and cycle-accurate simulation with a focus on perspective-corrected texture mapping and fixed-point interpolation.</description></item><item><title>claude-shorts: AI-driven pipeline for viral vertical video clips from long form content</title><link>https://ramdi.fr/github-stars/claude-shorts-ai-driven-pipeline-for-viral-vertical-video-clips-from-long-form-content/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/claude-shorts-ai-driven-pipeline-for-viral-vertical-video-clips-from-long-form-content/</guid><description>claude-shorts uses AI scoring, GPU transcription, and adaptive video reframing to extract viral-ready vertical clips from long videos, optimizing cuts with audio-aware snapping and platform-specific encoding.</description></item><item><title>Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single images</title><link>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/</guid><description>Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.</description></item><item><title>DeepEP: Optimizing communication for large Mixture-of-Experts models with CUDA kernels</title><link>https://ramdi.fr/github-stars/deepep-optimizing-communication-for-large-mixture-of-experts-models-with-cuda-kernels/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/deepep-optimizing-communication-for-large-mixture-of-experts-models-with-cuda-kernels/</guid><description>DeepEP is a CUDA-based communication library designed for Mixture-of-Experts models, delivering high-throughput GPU kernels with NVLink and RDMA support for efficient expert parallelism.</description></item><item><title>vLLM: Efficient large language model serving with paged attention and continuous batching</title><link>https://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/</link><pubDate>Sat, 02 May 2026 20:07:04 +0000</pubDate><guid>https://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/</guid><description>vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.</description></item></channel></rss>