Noureddine RAMDI / Paper2Any: multi-modal AI pipeline converting academic papers into editable scientific artifacts

Created Sat, 23 May 2026 20:41:14 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

OpenDCAI/Paper2Any

Academic papers are dense, structured documents packed with figures, diagrams, and complex layouts. But extracting editable scientific content from PDFs or screenshots into familiar formats like PowerPoint slides or DrawIO diagrams is notoriously challenging. Paper2Any tackles this problem with a multi-modal AI pipeline that orchestrates large language models (LLMs) to convert unstructured academic papers into editable scientific figures, technical diagrams, presentation slides, video scripts, posters, and even rebuttal drafts.

how Paper2Any structures paper-to-artifact conversion

Paper2Any is implemented in Python and built around a RESTful FastAPI backend paired with a React frontend. It orchestrates calls to multiple LLMs, including GPT-4o and Qwen-VL, treating the problem as a document-to-document translation task with format-specific post-processing. The system supports input in PDF, screenshots, or raw text, and outputs editable scientific content in formats like PPTX, DrawIO XML, and SVG.

A key architectural feature is the dynamic model selection enabled by a three-layer configuration system that lets users override LLM providers and models either simply or in a fine-grained workflow-specific manner. This design balances flexibility with usability by accommodating different deployment scenarios and model capabilities.

Paper2Any’s core modules include:

  • A submodule that generates DrawIO diagrams from paper content, enabling users to edit technical figures interactively.
  • A layout-preserving PDF-to-PPTX converter that retains the spatial positioning and structure of original slides or figures.
  • AI-assisted outline editing tools that help refine generated content before final export.
  • A knowledge base with embedding-based retrieval supporting retrieval-augmented generation (RAG) to create informed presentation slides.

Deployment is streamlined with Docker, using an nginx reverse proxy to serve the FastAPI backend and React frontend seamlessly.

technical strengths and design tradeoffs

What sets Paper2Any apart is its multi-modal chaining of LLM calls combined with structured output constraints tailored to various scientific artifact formats. Rather than a single monolithic LLM call, the pipeline decomposes the task into smaller, format-specific steps, each handled by specialized LLM prompts or models. This approach improves output quality and preserves the semantic and visual structure of the source academic paper.

The codebase exhibits a modular design, clearly separating concerns between backend API handling, frontend UI, LLM interaction, and format-specific processing. The three-layer configuration system adds complexity but offers critical flexibility for diverse workflows, supporting both simple and advanced user needs.

Tradeoffs include the added operational complexity of managing multiple LLM providers and models, which may impact deployment and cost management. Also, the reliance on commercial LLMs like GPT-4o may limit offline or fully open-source deployments.

From a user experience standpoint, the React frontend enables dynamic model selection, giving users control over which LLM to use for each workflow step. This is a practical feature for research teams experimenting with different AI capabilities.

The Docker-based deployment with nginx proxy simplifies running the stack in production or local environments but assumes some familiarity with containerization.

quick start

requirements

Paper2Any supports two configuration styles via environment variables:

  • Simple mode: recommended for most self-hosted users, using .env.simple.example files.
  • Advanced mode: for fine-grained workflow-specific model/provider overrides, using .env.example files.

To quickly get started with simple mode, run these commands:

cp fastapi_app/.env.simple.example fastapi_app/.env
cp frontend-workflow/.env.simple.example frontend-workflow/.env

For advanced users needing detailed overrides:

cp fastapi_app/.env.example fastapi_app/.env
cp frontend-workflow/.env.example frontend-workflow/.env

docker deployment

The project recommends using Docker with nginx as a reverse proxy. You can optionally enable a local SAM3 container by setting environment variables before running the Docker deployment script:

# Usually keep VITE_API_BASE_URL empty in Docker, because nginx proxies /api and /outputs
VITE_API_BASE_URL=

# Optional: enable local SAM3 container
SAM3_PORT=8021
SAM3_SERVER_URLS=

The deployment script deploy/docker-up.sh is used to start the services.

linux and windows notes

Linux users are encouraged to use Conda for a Python 3.11 isolated environment, while Windows users are advised to run Paper2Any on Linux or WSL for better compatibility.

verdict

Paper2Any is a thoughtfully designed multi-modal AI pipeline that addresses a real pain point in academic research workflows: converting static, unstructured papers into editable scientific figures, slides, and documents. Its modular architecture, multi-layer configuration, and dynamic model selection make it flexible and adaptable to various deployment and research needs.

The tradeoff is operational complexity—deploying and managing multiple LLMs along with backend and frontend services requires some infrastructure know-how. Also, its dependence on commercial LLMs may limit offline use or increase costs. However, for teams invested in AI-assisted scientific document generation, Paper2Any provides a unique, practical solution that goes beyond simple PDF conversion.

Exploring the repo reveals clean separation between the FastAPI backend, React frontend, and LLM orchestration logic, suggesting good maintainability. The inclusion of Docker deployment scripts and example environment configurations lowers the barrier for adoption.

If you frequently work with academic papers and need editable outputs for presentations, posters, or rebuttals, Paper2Any is worth evaluating. Its approach of chaining multi-modal LLM calls with structured output constraints is a solid pattern for similar AI-powered document transformation tasks.


→ GitHub Repo: OpenDCAI/Paper2Any ⭐ 2,351 · Python