Gpu on Noureddine RAMDI

Gpu on Noureddine RAMDIhttps://ramdi.fr/tags/gpu/Recent content in Gpu on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000A structured GPU performance engineering curriculum from fundamentals to frontier labshttps://ramdi.fr/github-stars/a-structured-gpu-performance-engineering-curriculum-from-fundamentals-to-frontier-labs/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/a-structured-gpu-performance-engineering-curriculum-from-fundamentals-to-frontier-labs/A curated GPU performance engineering curriculum focusing on CUDA, kernel optimization, and NVIDIA architectures, guiding engineers from fundamentals to advanced production techniques.deck.gl-raster: GPU-accelerated client-side rendering of massive geospatial rastershttps://ramdi.fr/github-stars/deck-gl-raster-gpu-accelerated-client-side-rendering-of-massive-geospatial-rasters/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/deck-gl-raster-gpu-accelerated-client-side-rendering-of-massive-geospatial-rasters/deck.gl-raster streams and renders huge Cloud-Optimized GeoTIFFs entirely in-browser using WebGL2, avoiding servers and preprocessing. It enables fast, scalable geospatial visualization of raw raster data.Inside Mini-SGLang: A clear and modular Python LLM inference enginehttps://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/inside-mini-sglang-a-clear-and-modular-python-llm-inference-engine/Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.OCRFlux: GPU-Accelerated OCR with Python for High-Performance Document Processinghttps://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.SceneSmith: AI-driven pipeline for physics-ready 3D indoor scene generation from texthttps://ramdi.fr/github-stars/scenesmith-ai-driven-pipeline-for-physics-ready-3d-indoor-scene-generation-from-text/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/scenesmith-ai-driven-pipeline-for-physics-ready-3d-indoor-scene-generation-from-text/SceneSmith uses GPT-5-powered agents to generate physically plausible 3D indoor scenes from text prompts, ready for robotics simulation without manual cleanup.VisoMaster Fusion: a portable Windows app bundling multiple AI face-swapping modelshttps://ramdi.fr/github-stars/visomaster-fusion-a-portable-windows-app-bundling-multiple-ai-face-swapping-models/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/visomaster-fusion-a-portable-windows-app-bundling-multiple-ai-face-swapping-models/VisoMaster Fusion bundles over a dozen AI face-swapping models into a portable Windows desktop app with automatic runtime setup, simplifying the complex AI video editing workflow.Repurposing the ASRock AMD BC-250: Community-driven firmware unlocking on PS5-derived siliconhttps://ramdi.fr/github-stars/repurposing-the-asrock-amd-bc-250-community-driven-firmware-unlocking-on-ps5-derived-silicon/Tue, 05 May 2026 16:46:42 +0000https://ramdi.fr/github-stars/repurposing-the-asrock-amd-bc-250-community-driven-firmware-unlocking-on-ps5-derived-silicon/The ASRock AMD BC-250 mining board uses PS5-derived silicon with 6 Zen 2 cores and a 24CU RDNA2 GPU sharing 16GB GDDR6. This repo documents community firmware mods and Linux GPU support.TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughputhttps://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.NVIDIA Warp: JIT-compiling Python for CUDA-powered differentiable physicshttps://ramdi.fr/github-stars/nvidia-warp-jit-compiling-python-for-cuda-powered-differentiable-physics/Mon, 04 May 2026 10:23:03 +0000https://ramdi.fr/github-stars/nvidia-warp-jit-compiling-python-for-cuda-powered-differentiable-physics/NVIDIA Warp lets you write Python functions JIT-compiled into CUDA kernels for GPU-accelerated differentiable physics and ML integration, simplifying GPU programming in Python.AniGen: GPU-accelerated 3D animation generation with Python and CUDAhttps://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/anigen-gpu-accelerated-3d-animation-generation-with-python-and-cuda/AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyondhttps://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.NVIDIA open GPU kernel modules: a pragmatic architecture for Linux GPU drivershttps://ramdi.fr/github-stars/nvidia-open-gpu-kernel-modules-a-pragmatic-architecture-for-linux-gpu-drivers/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/nvidia-open-gpu-kernel-modules-a-pragmatic-architecture-for-linux-gpu-drivers/NVIDIA’s open GPU kernel modules split driver code into pre-built OS-agnostic binaries and thin kernel interface layers, avoiding recompilation on Linux kernel updates. Here’s how it works.Recreating the 3dfx Voodoo GPU in SpinalHDL for FPGA and cycle-accurate simulationhttps://ramdi.fr/github-stars/recreating-the-3dfx-voodoo-gpu-in-spinalhdl-for-fpga-and-cycle-accurate-simulation/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/recreating-the-3dfx-voodoo-gpu-in-spinalhdl-for-fpga-and-cycle-accurate-simulation/SpinalVoodoo rebuilds the classic 3dfx Voodoo Graphics GPU in SpinalHDL, targeting FPGA synthesis and cycle-accurate simulation with a focus on perspective-corrected texture mapping and fixed-point interpolation.claude-shorts: AI-driven pipeline for viral vertical video clips from long form contenthttps://ramdi.fr/github-stars/claude-shorts-ai-driven-pipeline-for-viral-vertical-video-clips-from-long-form-content/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/claude-shorts-ai-driven-pipeline-for-viral-vertical-video-clips-from-long-form-content/claude-shorts uses AI scoring, GPU transcription, and adaptive video reframing to extract viral-ready vertical clips from long videos, optimizing cuts with audio-aware snapping and platform-specific encoding.Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single imageshttps://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/cupid-feed-forward-3d-reconstruction-with-joint-camera-pose-estimation-from-single-images/Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.DeepEP: Optimizing communication for large Mixture-of-Experts models with CUDA kernelshttps://ramdi.fr/github-stars/deepep-optimizing-communication-for-large-mixture-of-experts-models-with-cuda-kernels/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/deepep-optimizing-communication-for-large-mixture-of-experts-models-with-cuda-kernels/DeepEP is a CUDA-based communication library designed for Mixture-of-Experts models, delivering high-throughput GPU kernels with NVLink and RDMA support for efficient expert parallelism.vLLM: Efficient large language model serving with paged attention and continuous batchinghttps://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/vllm-efficient-large-language-model-serving-with-paged-attention-and-continuous-batching/vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.