Bytez offers a unified API for over 220,000 AI models with serverless GPU orchestration, abstracting model diversity into a single inference platform accessible via one key.
Mini-SGLang is a modular Python reimplementation of the SGLang LLM inference engine with production features like Radix Cache, chunked prefill, overlap scheduling, and tensor parallelism.
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
Lucebox Hub optimizes LLM inference on consumer GPUs using a megakernel CUDA approach and speculative decoding, achieving high throughput on RTX 3090 and newer Nvidia GPUs.
MicroGPT-C uses a deterministic C scaffold to coordinate tiny GPT-2 models, achieving 90%+ accuracy on logic games with 8x memory compression and infinite sequence lengths.
TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.
vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports quantization, distributed inference, and an OpenAI-compatible API.