Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

7 results for Inference

Clear filter

Bytez: unified serverless inference across 220,000 AI models with a single API
Bytez offers a unified API for over 220,000 AI models with serverless GPU orchestration, abstracting model diversity into a single inference platform accessible via one key.
github-stars typescript serverless ai machine learning Created Sat, 23 May 2026 20:41:14 +0000
Inside Mini-SGLang: A clear and modular Python LLM inference engine
Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.
github-stars python llm inference gpu Created Sat, 23 May 2026 20:41:14 +0000
Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
github-stars pytorch multimodal transformers cuda Created Mon, 04 May 2026 10:23:02 +0000
Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
github-stars cuda llm gpu inference Created Mon, 04 May 2026 10:23:02 +0000
MicroGPT-C: Coordinating tiny GPT-2 models in C for edge logical reasoning
MicroGPT-C uses a deterministic C scaffold to coordinate tiny GPT-2 models, achieving 90%+ accuracy on logic games with 8x memory compression and infinite sequence lengths.
github-stars c gpt-2 transformer edge-ai Created Mon, 04 May 2026 10:23:02 +0000
TextGen: a portable zero-config local LLM runner with multi-backend and multimodal support
TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.
github-stars python llm local-llm multimodal Created Mon, 04 May 2026 10:23:02 +0000
vLLM: Efficient large language model serving with paged attention and continuous batching
vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.
github-stars python llm inference gpu Created Sat, 02 May 2026 20:07:04 +0000

Noureddine RAMDI Dinour

Organizations

Bytez: unified serverless inference across 220,000 AI models with a single API

Inside Mini-SGLang: A clear and modular Python LLM inference engine

Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR

Lucebox Hub: hand-optimized CUDA kernels for efficient LLM inference on RTX 3090 and beyond

MicroGPT-C: Coordinating tiny GPT-2 models in C for edge logical reasoning

TextGen: a portable zero-config local LLM runner with multi-backend and multimodal support

vLLM: Efficient large language model serving with paged attention and continuous batching