<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Multimodal on Noureddine RAMDI</title><link>https://ramdi.fr/tags/multimodal/</link><description>Recent content in Multimodal on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/multimodal/index.xml" rel="self" type="application/rss+xml"/><item><title>npcpy: enforcing AI behavioral compliance through architecture for multimodal LLM apps</title><link>https://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/npcpy-enforcing-ai-behavioral-compliance-through-architecture-for-multimodal-llm-apps/</guid><description>npcpy offers a unique NPC Context-Agent-Tool data layer to enforce AI compliance via software architecture, supporting multimodal LLM apps and multi-agent systems with local and cloud providers.</description></item><item><title>OmniGen2: a unified multimodal generation model with separate decoding paths for text and images</title><link>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/omnigen2-a-unified-multimodal-generation-model-with-separate-decoding-paths-for-text-and-images/</guid><description>OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.</description></item><item><title>MedRAX: orchestrating specialized AI tools for chest X-ray analysis with dynamic routing</title><link>https://ramdi.fr/github-stars/medrax-orchestrating-specialized-ai-tools-for-chest-x-ray-analysis-with-dynamic-routing/</link><pubDate>Tue, 05 May 2026 16:46:42 +0000</pubDate><guid>https://ramdi.fr/github-stars/medrax-orchestrating-specialized-ai-tools-for-chest-x-ray-analysis-with-dynamic-routing/</guid><description>MedRAX uses GPT-4o to dynamically route medical queries across multiple AI models for chest X-ray interpretation. It offers modular, tool-agnostic orchestration with a Gradio interface.</description></item><item><title>daVinci-MagiHuman: Simplifying multimodal video and audio generation with a single-stream transformer</title><link>https://ramdi.fr/github-stars/davinci-magihuman-simplifying-multimodal-video-and-audio-generation-with-a-single-stream-transformer/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/davinci-magihuman-simplifying-multimodal-video-and-audio-generation-with-a-single-stream-transformer/</guid><description>daVinci-MagiHuman uses a 15B-parameter single-stream transformer with a sandwich architecture to generate video and audio from text, achieving competitive quality and fast inference on a single H100 GPU.</description></item><item><title>Exploring Claude API integration patterns with anthropics/claude-cookbooks</title><link>https://ramdi.fr/github-stars/exploring-claude-api-integration-patterns-with-anthropics-claude-cookbooks/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/exploring-claude-api-integration-patterns-with-anthropics-claude-cookbooks/</guid><description>anthropics/claude-cookbooks offers Jupyter Notebook recipes demonstrating practical Claude API usage, including sub-agent orchestration, multimodal vision, and RAG patterns.</description></item><item><title>Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR</title><link>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</guid><description>Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.</description></item><item><title>Inside Alibaba’s VRAG: Multimodal Retrieval-Augmented Generation with Dynamic Reasoning Graphs</title><link>https://ramdi.fr/github-stars/inside-alibabas-vrag-multimodal-retrieval-augmented-generation-with-dynamic-reasoning-graphs/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-alibabas-vrag-multimodal-retrieval-augmented-generation-with-dynamic-reasoning-graphs/</guid><description>Alibaba&amp;rsquo;s VRAG models reasoning as a dynamic DAG with multimodal memory and RL-based fine-grained credit assignment, supporting text, image, and video retrieval in a unified framework.</description></item><item><title>Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusion</title><link>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/omni-diffusion-unified-any-to-any-multimodal-generation-with-masked-discrete-diffusion/</guid><description>Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.</description></item><item><title>TextGen: a portable zero-config local LLM runner with multi-backend and multimodal support</title><link>https://ramdi.fr/github-stars/textgen-a-portable-zero-config-local-llm-runner-with-multi-backend-and-multimodal-support/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/textgen-a-portable-zero-config-local-llm-runner-with-multi-backend-and-multimodal-support/</guid><description>TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.</description></item><item><title>Claudish: A versatile TypeScript CLI proxy bridging Claude Code with 580+ AI models</title><link>https://ramdi.fr/github-stars/claudish-a-versatile-typescript-cli-proxy-bridging-claude-code-with-580-ai-models/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/claudish-a-versatile-typescript-cli-proxy-bridging-claude-code-with-580-ai-models/</guid><description>Claudish is a TypeScript CLI proxy that lets Claude Code work with 580+ AI models via OpenRouter, direct APIs, and local inference, enabling multimodal capabilities through vision proxying.</description></item></channel></rss>