Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

5 results for Quantization

Clear filter

vLLM Compressor: Practical quantization and compression for large language model inference
vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.
github-stars python llm quantization compression Created Sat, 23 May 2026 20:41:14 +0000
LiteRT-LM: Google's C++ library for efficient edge language model inference
LiteRT-LM is a Google AI Edge C++ library for performant language model inference on edge devices with multi-language API support and easy CLI usage.
github-stars cpp language-models quantization edge-ai Created Mon, 04 May 2026 10:23:02 +0000
Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
github-stars cuda llm gpu inference Created Mon, 04 May 2026 10:23:02 +0000
A hands-on course for mastering large language models: fine-tuning, quantization, and tooling
Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools like AutoEval and LazyMergekit. Ideal for aspiring LLM engineers.
github-stars llm fine-tuning quantization python Created Sat, 02 May 2026 20:07:04 +0000
LlamaFactory: modular, extensible fine-tuning framework for large language models
LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.
github-stars python llm fine-tuning machine-learning Created Sat, 02 May 2026 20:07:04 +0000

Noureddine RAMDI Dinour

Organizations

vLLM Compressor: Practical quantization and compression for large language model inference

LiteRT-LM: Google's C++ library for efficient edge language model inference

Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond

A hands-on course for mastering large language models: fine-tuning, quantization, and tooling

LlamaFactory: modular, extensible fine-tuning framework for large language models