Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

12 results for Multimodal

Clear filter

Inside InternVL: Open-Source Multimodal Large Language Models with Reinforcement Learning
InternVL offers open-source multimodal large language models combining vision transformers and LLMs, featuring CascadeRL training and competitive benchmarks like GPT-4o.
github-stars multimodal llm reinforcement-learning vision-transformer Created Mon, 06 Jul 2026 15:15:52 +0000
OpenThinkIMG: Modular vision tool orchestration for enhanced multimodal inference
OpenThinkIMG enables modular orchestration of independent vision tools for enhanced inference workflows using PyTorch and service-based architecture. Clear quickstart included.
github-stars python pytorch vision multimodal Created Mon, 06 Jul 2026 15:15:52 +0000
npcpy: enforcing AI behavioral compliance through architecture for multimodal LLM apps
npcpy offers a unique NPC Context-Agent-Tool data layer to enforce AI compliance via software architecture, supporting multimodal LLM apps and multi-agent systems with local and cloud providers.
github-stars python llm agentic-ai multimodal Created Sat, 23 May 2026 20:41:14 +0000
OmniGen2: a unified multimodal generation model with separate decoding paths for text and images
OmniGen2 unifies visual understanding, text-to-image generation, and image editing using distinct decoding pathways for text and images, built on Qwen-VL-2.5 with CPU offloading for accessibility.
github-stars multimodal deep-learning pytorch image-generation Created Sat, 23 May 2026 20:41:14 +0000
MedRAX: orchestrating specialized AI tools for chest X-ray analysis with dynamic routing
MedRAX uses GPT-4o to dynamically route medical queries across multiple AI models for chest X-ray interpretation. It offers modular, tool-agnostic orchestration with a Gradio interface.
github-stars python agentic-ai medical-ai chest-xray Created Tue, 05 May 2026 16:46:42 +0000
daVinci-MagiHuman: Simplifying multimodal video and audio generation with a single-stream transformer
daVinci-MagiHuman uses a 15B-parameter single-stream transformer with a sandwich architecture to generate video and audio from text, achieving competitive quality and fast inference on a single H100 GPU.
github-stars python transformer multimodal video-generation Created Mon, 04 May 2026 10:23:02 +0000
Exploring Claude API integration patterns with anthropics/claude-cookbooks
anthropics/claude-cookbooks offers Jupyter Notebook recipes demonstrating practical Claude API usage, including sub-agent orchestration, multimodal vision, and RAG patterns.
github-stars ai llm claude-api python Created Mon, 04 May 2026 10:23:02 +0000
Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
github-stars pytorch multimodal transformers cuda Created Mon, 04 May 2026 10:23:02 +0000
Inside Alibaba’s VRAG: Multimodal Retrieval-Augmented Generation with Dynamic Reasoning Graphs
Alibaba’s VRAG models reasoning as a dynamic DAG with multimodal memory and RL-based fine-grained credit assignment, supporting text, image, and video retrieval in a unified framework.
github-stars python multimodal rag reinforcement-learning Created Mon, 04 May 2026 10:23:02 +0000
Omni-Diffusion: unified any-to-any multimodal generation with masked discrete diffusion
Omni-Diffusion models text, image, and speech tokens jointly via masked discrete diffusion, enabling any-to-any multimodal generation with a single unified model.
github-stars python multimodal diffusion-model machine-learning Created Mon, 04 May 2026 10:23:02 +0000
TextGen: a portable zero-config local LLM runner with multi-backend and multimodal support
TextGen offers a portable desktop app for local LLMs with zero telemetry and multi-backend support. Drop GGUF models in a folder and run with no complex setup. It features multimodal vision, file attachments, and OpenAI-compatible API.
github-stars python llm local-llm multimodal Created Mon, 04 May 2026 10:23:02 +0000
Claudish: A versatile TypeScript CLI proxy bridging Claude Code with 580+ AI models
Claudish is a TypeScript CLI proxy that lets Claude Code work with 580+ AI models via OpenRouter, direct APIs, and local inference, enabling multimodal capabilities through vision proxying.
github-stars typescript cli ai proxy Created Mon, 04 May 2026 10:23:01 +0000