paperetl is a Python ETL library that normalizes PDFs, PubMed, arXiv, TEI, and CSV metadata into a unified article schema, supporting SQLite, JSON, YAML, and Elasticsearch storage.
Monopoly-core is a Python library and CLI for converting bank statement PDFs to CSV using per-bank parser classes. It supports 20+ banks, OCR, and safety checks.