Academic papers are dense, structured documents packed with figures, diagrams, and complex layouts. But extracting editable scientific content from PDFs or screenshots into familiar formats like PowerPoint slides or DrawIO diagrams is notoriously challenging. Paper2Any tackles this problem with a multi-modal AI pipeline that orchestrates large language models (LLMs) to convert unstructured academic papers into editable scientific figures, technical diagrams, presentation slides, video scripts, posters, and even rebuttal drafts.
how Paper2Any structures paper-to-artifact conversion
Paper2Any is implemented in Python and built around a RESTful FastAPI backend paired with a React frontend. It orchestrates calls to multiple LLMs, including GPT-4o and Qwen-VL, treating the problem as a document-to-document translation task with format-specific post-processing. The system supports input in PDF, screenshots, or raw text, and outputs editable scientific content in formats like PPTX, DrawIO XML, and SVG.
A key architectural feature is the dynamic model selection enabled by a three-layer configuration system that lets users override LLM providers and models either simply or in a fine-grained workflow-specific manner. This design balances flexibility with usability by accommodating different deployment scenarios and model capabilities.
Paper2Any’s core modules include:
- A submodule that generates DrawIO diagrams from paper content, enabling users to edit technical figures interactively.
- A layout-preserving PDF-to-PPTX converter that retains the spatial positioning and structure of original slides or figures.
- AI-assisted outline editing tools that help refine generated content before final export.
- A knowledge base with embedding-based retrieval supporting retrieval-augmented generation (RAG) to create informed presentation slides.
Deployment is streamlined with Docker, using an nginx reverse proxy to serve the FastAPI backend and React frontend seamlessly.
technical strengths and design tradeoffs
What sets Paper2Any apart is its multi-modal chaining of LLM calls combined with structured output constraints tailored to various scientific artifact formats. Rather than a single monolithic LLM call, the pipeline decomposes the task into smaller, format-specific steps, each handled by specialized LLM prompts or models. This approach improves output quality and preserves the semantic and visual structure of the source academic paper.
The codebase exhibits a modular design, clearly separating concerns between backend API handling, frontend UI, LLM interaction, and format-specific processing. The three-layer configuration system adds complexity but offers critical flexibility for diverse workflows, supporting both simple and advanced user needs.
Tradeoffs include the added operational complexity of managing multiple LLM providers and models, which may impact deployment and cost management. Also, the reliance on commercial LLMs like GPT-4o may limit offline or fully open-source deployments.
From a user experience standpoint, the React frontend enables dynamic model selection, giving users control over which LLM to use for each workflow step. This is a practical feature for research teams experimenting with different AI capabilities.
The Docker-based deployment with nginx proxy simplifies running the stack in production or local environments but assumes some familiarity with containerization.
quick start
requirements
Paper2Any supports two configuration styles via environment variables:
- Simple mode: recommended for most self-hosted users, using
.env.simple.examplefiles. - Advanced mode: for fine-grained workflow-specific model/provider overrides, using
.env.examplefiles.
To quickly get started with simple mode, run these commands:
cp fastapi_app/.env.simple.example fastapi_app/.env
cp frontend-workflow/.env.simple.example frontend-workflow/.env
For advanced users needing detailed overrides:
cp fastapi_app/.env.example fastapi_app/.env
cp frontend-workflow/.env.example frontend-workflow/.env
docker deployment
The project recommends using Docker with nginx as a reverse proxy. You can optionally enable a local SAM3 container by setting environment variables before running the Docker deployment script:
# Usually keep VITE_API_BASE_URL empty in Docker, because nginx proxies /api and /outputs
VITE_API_BASE_URL=
# Optional: enable local SAM3 container
SAM3_PORT=8021
SAM3_SERVER_URLS=
The deployment script deploy/docker-up.sh is used to start the services.
linux and windows notes
Linux users are encouraged to use Conda for a Python 3.11 isolated environment, while Windows users are advised to run Paper2Any on Linux or WSL for better compatibility.
verdict
Paper2Any is a thoughtfully designed multi-modal AI pipeline that addresses a real pain point in academic research workflows: converting static, unstructured papers into editable scientific figures, slides, and documents. Its modular architecture, multi-layer configuration, and dynamic model selection make it flexible and adaptable to various deployment and research needs.
The tradeoff is operational complexity—deploying and managing multiple LLMs along with backend and frontend services requires some infrastructure know-how. Also, its dependence on commercial LLMs may limit offline use or increase costs. However, for teams invested in AI-assisted scientific document generation, Paper2Any provides a unique, practical solution that goes beyond simple PDF conversion.
Exploring the repo reveals clean separation between the FastAPI backend, React frontend, and LLM orchestration logic, suggesting good maintainability. The inclusion of Docker deployment scripts and example environment configurations lowers the barrier for adoption.
If you frequently work with academic papers and need editable outputs for presentations, posters, or rebuttals, Paper2Any is worth evaluating. Its approach of chaining multi-modal LLM calls with structured output constraints is a solid pattern for similar AI-powered document transformation tasks.
Related Articles
- Paper2Agent: Automating the transformation of research paper codebases into interactive MCP servers — Paper2Agent automates converting research paper codebases into interactive MCP servers for AI coding agents, handling tu
- AI Knowledge Graph Generator: Building structured graphs from unstructured text with LLMs — A Python tool that converts unstructured text into interactive knowledge graphs using a three-phase LLM pipeline with SP
- paper2code: auditing ambiguity in ML paper code generation with citation-anchored implementations — paper2code transforms arxiv papers into Python code with ambiguity auditing and inline citations, prioritizing traceabil
- DocStrange: A versatile Python library for LLM-optimized document parsing with dual-mode processing — DocStrange converts PDFs, DOCX, PPTX, XLSX, images, and URLs into LLM-ready Markdown, JSON, HTML, and CSV. It offers fre
- gpt_image_2_skill: modular AI image generation prompts as an agent skill and CLI — gpt_image_2_skill packages 162 curated image generation prompts as an AI agent skill and CLI, wrapping OpenAI’s image AP
→ GitHub Repo: OpenDCAI/Paper2Any ⭐ 2,351 · Python