npcpy offers a unique NPC Context-Agent-Tool data layer to enforce AI compliance via software architecture, supporting multimodal LLM apps and multi-agent systems with local and cloud providers.
OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.
MedRAX uses GPT-4o to dynamically route medical queries across multiple AI models for chest X-ray interpretation. It offers modular, tool-agnostic orchestration with a Gradio interface.
daVinci-MagiHuman uses a 15B-parameter single-stream transformer with a sandwich architecture to generate video and audio from text, achieving competitive quality and fast inference on a single H100 GPU.
anthropics/claude-cookbooks offers Jupyter Notebook recipes demonstrating practical Claude API usage, including sub-agent orchestration, multimodal vision, and RAG patterns.
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
Alibaba’s VRAG models reasoning as a dynamic DAG with multimodal memory and RL-based fine-grained credit assignment, supporting text, image, and video retrieval in a unified framework.
Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.
TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.
Claudish is a TypeScript CLI proxy that lets Claude Code work with 580+ AI models via OpenRouter, direct APIs, and local inference, enabling multimodal capabilities through vision proxying.