Quantization on Noureddine RAMDI

Quantization on Noureddine RAMDIhttps://ramdi.fr/tags/quantization/Recent content in Quantization on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000vLLM Compressor: Practical quantization and compression for large language model inferencehttps://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/vllm-compressor-practical-quantization-and-compression-for-large-language-model-inference/vLLM Compressor applies advanced quantization and compression techniques to large language models, enabling optimized inference without requiring full model definitions.LiteRT-LM: Google's C++ library for efficient edge language model inferencehttps://ramdi.fr/github-stars/litert-lm-google-s-c-library-for-efficient-edge-language-model-inference/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/litert-lm-google-s-c-library-for-efficient-edge-language-model-inference/LiteRT-LM is a Google AI Edge C++ library for performant language model inference on edge devices with multi-language API support and easy CLI usage.Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyondhttps://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/lucebox-hub-hand-optimized-cuda-kernels-for-efficient-llm-inference-on-rtx-3090-and-beyond/Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.A hands-on course for mastering large language models: fine-tuning, quantization, and toolinghttps://ramdi.fr/github-stars/a-hands-on-course-for-mastering-large-language-models-fine-tuning-quantization-and-tooling/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/a-hands-on-course-for-mastering-large-language-models-fine-tuning-quantization-and-tooling/Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools like AutoEval and LazyMergekit. Ideal for aspiring LLM engineers.LlamaFactory: modular, extensible fine-tuning framework for large language modelshttps://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/Sat, 02 May 2026 20:07:04 +0000https://ramdi.fr/github-stars/llamafactory-modular-extensible-fine-tuning-framework-for-large-language-models/LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, including LoRA, QLoRA, and reinforcement learning.