Dedoc is a Python library and REST API that extracts structured content from diverse documents including PDFs, Office files, and images using a unique virtual stack machine PDF interpreter and OCR preprocessing.
Excalibur provides a Flask web UI over Camelot for extracting tabular data from PDFs. Supports manual selection, auto-detection, multiple backends, and export formats.
KillerPDF is a standalone Windows PDF editor in a single ~6 MB EXE, bundling PDFium and annotation features with zero dependencies, offline and subscription-free.
MarkPDFDown is a Python CLI tool that converts PDFs and images into Markdown by using vision-capable large language models for visual recognition-based parsing, handling complex layouts and formulas.
pdftochat is a TypeScript-based PDF-to-chat app leveraging Chroma Cloud for hybrid vector search and Together.ai for LLMs, integrating multiple cloud services for scalable document Q&A.
rachoon is a TypeScript-based document processing service combining PostgreSQL and Gotenberg for PDF generation. It features modular containerized architecture with practical deployment.
Open PDF Studio is a cross-platform PDF editor using Rust backend and Tauri 2 with PDF.js rendering. It offers 20+ annotation tools, form support, and multi-platform installs under LGPL-3.0.
Stirling PDF offers 50+ PDF tools, a private REST API, and multi-platform deployment for self-hosted, no-code automated PDF workflows. Here’s how it works under the hood.