A curated GPU performance engineering curriculum focusing on CUDA, kernel optimization, and NVIDIA architectures, guiding engineers from fundamentals to advanced production techniques.
deck.gl-raster streams and renders huge Cloud-Optimized GeoTIFFs entirely in-browser using WebGL2, avoiding servers and preprocessing. It enables fast, scalable geospatial visualization of raw raster data.
Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.
OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.
SceneSmith uses GPT-5-powered agents to generate physically plausible 3D indoor scenes from text prompts, ready for robotics simulation without manual cleanup.
VisoMaster Fusion bundles over a dozen AI face-swapping models into a portable Windows desktop app with automatic runtime setup, simplifying the complex AI video editing workflow.
The ASRock AMD BC-250 mining board uses PS5-derived silicon with 6 Zen 2 cores and a 24CU RDNA2 GPU sharing 16GB GDDR6. This repo documents community firmware mods and Linux GPU support.
TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.
NVIDIA Warp lets you write Python functions JIT-compiled into CUDA kernels for GPU-accelerated differentiable physics and ML integration, simplifying GPU programming in Python.
AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
NVIDIA’s open GPU kernel modules split driver code into pre-built OS-agnostic binaries and thin kernel interface layers, avoiding recompilation on Linux kernel updates. Here’s how it works.
SpinalVoodoo rebuilds the classic 3dfx Voodoo Graphics GPU in SpinalHDL, targeting FPGA synthesis and cycle-accurate simulation with a focus on perspective-corrected texture mapping and fixed-point interpolation.
claude-shorts uses AI scoring, GPU transcription, and adaptive video reframing to extract viral-ready vertical clips from long videos, optimizing cuts with audio-aware snapping and platform-specific encoding.
Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.
DeepEP is a CUDA-based communication library designed for Mixture-of-Experts models, delivering high-throughput GPU kernels with NVLink and RDMA support for efficient expert parallelism.
vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.