Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

18 results for Cuda

Clear filter

SegmentAnything3D: zero-shot 3D segmentation by projecting 2D masks into point clouds
SegmentAnything3D transfers Meta’s 2D Segment Anything Model to 3D point clouds without 3D training, using depth-based mask projection and graph-based merging for zero-shot 3D segmentation.
github-stars python 3d-segmentation point-cloud depth-mapping Created Mon, 06 Jul 2026 15:15:52 +0000
Transcribe-Anything: a unified Whisper AI frontend with multi-backend support and speaker diarization
Transcribe-Anything offers a unified CLI and Python API for Whisper AI transcription, selecting optimized backends for CUDA, Apple Silicon, or CPU. It adds unique speaker diarization and supports GPU Docker deployment.
github-stars python whisper speech-to-text speaker-diarization Created Mon, 06 Jul 2026 15:15:52 +0000
A structured GPU performance engineering curriculum from fundamentals to frontier labs
A curated GPU performance engineering curriculum focusing on CUDA, kernel optimization, and NVIDIA architectures, guiding engineers from fundamentals to advanced production techniques.
github-stars gpu cuda cutlass triton Created Sat, 23 May 2026 20:41:14 +0000
DeepSpeed: scalable deep learning optimization with extensible hardware support
DeepSpeed is a Python library that optimizes large-scale deep learning training with multi-hardware support and JIT CUDA extensions. Explore its architecture, strengths, and quick installation.
github-stars python deep-learning pytorch cuda Created Sat, 23 May 2026 20:41:14 +0000
DualSDF: A two-level signed distance function approach for semantic 3D shape manipulation
DualSDF separates coarse semantic structure from fine geometric detail in 3D shape modeling using a two-level signed distance function. It enables intuitive shape edits with pretrained models and a WebGL demo.
github-stars 3d pytorch signed-distance-function shape-manipulation Created Sat, 23 May 2026 20:41:14 +0000
GS-Playground: High-throughput photorealistic simulation for vision-based robot learning
GS-Playground combines 3D Gaussian Splatting rendering with a velocity-impulse physics engine to enable large-scale visual reinforcement learning at up to 10^4 FPS. Preview release with core simulation API and demos.
github-stars robotics simulation reinforcement-learning 3d-gaussian-splatting Created Sat, 23 May 2026 20:41:14 +0000
Lynx: modular personalized video generation with dual adapters on a frozen diffusion transformer
Lynx generates personalized videos from a single image using a frozen Diffusion Transformer with ID and Ref adapters. This modular design balances fidelity and efficiency.
github-stars python video-generation diffusion-models transformers Created Sat, 23 May 2026 20:41:14 +0000
TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughput
TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.
github-stars cpp cuda ocr tensorrt Created Tue, 05 May 2026 13:37:39 +0000
NVIDIA Warp: JIT-compiling Python for CUDA-powered differentiable physics
NVIDIA Warp lets you write Python functions JIT-compiled into CUDA kernels for GPU-accelerated differentiable physics and ML integration, simplifying GPU programming in Python.
github-stars python cuda gpu jit-compilation Created Mon, 04 May 2026 10:23:03 +0000
AniGen: GPU-accelerated 3D animation generation with Python and CUDA
AniGen is a Linux-only Python project for 3D animation generation using NVIDIA GPUs and CUDA. It integrates PyTorch, spconv, and pytorch3d with a smooth setup script for complex dependencies.
github-stars python cuda pytorch 3d-animation Created Mon, 04 May 2026 10:23:02 +0000
DIMO: Distilling Diverse 3D Motion Priors for Arbitrary Object Motion Synthesis
DIMO distills motion priors from text-conditioned and multi-view video models into a shared latent space, enabling diverse 3D motion generation for arbitrary objects using 3D Gaussian splatting and 4D rendering.
github-stars python pytorch 3d-motion 3d-gaussian-splatting Created Mon, 04 May 2026 10:23:02 +0000
Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
github-stars pytorch multimodal transformers cuda Created Mon, 04 May 2026 10:23:02 +0000
Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
github-stars cuda llm gpu inference Created Mon, 04 May 2026 10:23:02 +0000
OpenPose: real-time multi-person 2D pose estimation with constant-time body detection
OpenPose is a C++ library for real-time multi-person 2D pose estimation using Part Affinity Fields, enabling constant inference time for body detection regardless of person count.
github-stars c++ pose-estimation computer-vision cuda Created Mon, 04 May 2026 10:23:02 +0000
Streaming 3D scene reconstruction with LingBot-Map’s geometric context transformer
LingBot-Map performs streaming 3D reconstruction from long image sequences at ~20 FPS using a geometric context transformer and paged KV cache attention for efficient memory management.
github-stars python 3d-reconstruction transformers streaming-inference Created Mon, 04 May 2026 10:23:02 +0000
Cupid: feed-forward 3D reconstruction with joint camera pose estimation from single images
Cupid is a feed-forward 3D reconstruction model that jointly estimates camera pose and reconstructs 3D objects from single 2D images, outputting textured 3D meshes and radiance fields in seconds.
github-stars 3d-reconstruction computer-vision deep-learning cuda Created Mon, 04 May 2026 10:23:01 +0000
MR.ScaleMaster: heterogeneous multi-robot monocular SLAM fusion via Sim(3) optimization
MR.ScaleMaster fuses scale-ambiguous monocular SLAM trajectories from multiple robots using Sim(3) graph optimization, enabling heterogeneous SLAM frontends and consistent global maps.
github-stars robotics slam multi-robot monocular-slam Created Mon, 04 May 2026 10:23:01 +0000
DeepEP: Optimizing communication for large Mixture-of-Experts models with CUDA kernels
DeepEP is a CUDA-based communication library designed for Mixture-of-Experts models, delivering high-throughput GPU kernels with NVLink and RDMA support for efficient expert parallelism.
github-stars cuda gpu mixture-of-experts expert-parallelism Created Sat, 02 May 2026 20:07:04 +0000