<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Ocr on Noureddine RAMDI</title><link>https://ramdi.fr/tags/ocr/</link><description>Recent content in Ocr on Noureddine RAMDI</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 23 May 2026 20:41:27 +0000</lastBuildDate><atom:link href="https://ramdi.fr/tags/ocr/index.xml" rel="self" type="application/rss+xml"/><item><title>Comic Translate: AI-driven multi-language comic translation with full-page context</title><link>https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/</guid><description>Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.</description></item><item><title>Dedoc: Python library for structured document content extraction with a virtual stack machine PDF engine</title><link>https://ramdi.fr/github-stars/dedoc-python-library-for-structured-document-content-extraction-with-a-virtual-stack-machine-pdf-engine/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/dedoc-python-library-for-structured-document-content-extraction-with-a-virtual-stack-machine-pdf-engine/</guid><description>Dedoc is a Python library and REST API that extracts structured content from diverse documents including PDFs, Office files, and images using a unique virtual stack machine PDF interpreter and OCR preprocessing.</description></item><item><title>Inside Papermerge: an open-source OCR document management system with a scalable meta-repo architecture</title><link>https://ramdi.fr/github-stars/inside-papermerge-an-open-source-ocr-document-management-system-with-a-scalable-meta-repo-architecture/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-papermerge-an-open-source-ocr-document-management-system-with-a-scalable-meta-repo-architecture/</guid><description>Papermerge is a Python-based open-source document management system for scanned files with OCR and full-text search, using a meta-repo pattern to scale its codebase.</description></item><item><title>Nougat: Vision Transformer OCR for academic PDFs extracting LaTeX math and tables</title><link>https://ramdi.fr/github-stars/nougat-vision-transformer-ocr-for-academic-pdfs-extracting-latex-math-and-tables/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/nougat-vision-transformer-ocr-for-academic-pdfs-extracting-latex-math-and-tables/</guid><description>Nougat is Meta&amp;rsquo;s neural OCR system for academic PDFs, extracting LaTeX math and tables into structured Markdown using a Vision Transformer encoder-decoder. It offers CLI, API, and training tools.</description></item><item><title>OCRFlux: GPU-Accelerated OCR with Python for High-Performance Document Processing</title><link>https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/</guid><description>OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.</description></item><item><title>Parsing bank statements with monopoly-core: a per-bank parser approach in Python</title><link>https://ramdi.fr/github-stars/parsing-bank-statements-with-monopoly-core-a-per-bank-parser-approach-in-python/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/parsing-bank-statements-with-monopoly-core-a-per-bank-parser-approach-in-python/</guid><description>Monopoly-core is a Python library and CLI for converting bank statement PDFs to CSV using per-bank parser classes. It supports 20+ banks, OCR, and safety checks.</description></item><item><title>pdf-document-layout-analysis: a dual-model PDF layout analysis microservice with Docker deployment</title><link>https://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/</link><pubDate>Sat, 23 May 2026 20:41:14 +0000</pubDate><guid>https://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/</guid><description>pdf-document-layout-analysis is a Dockerized microservice using Vision Grid Transformer and LightGBM for PDF layout analysis, offering high accuracy or fast processing with OCR, translation, and multi-format export.</description></item><item><title>TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughput</title><link>https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/</link><pubDate>Tue, 05 May 2026 13:37:39 +0000</pubDate><guid>https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/</guid><description>TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.</description></item><item><title>Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR</title><link>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/</guid><description>Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.</description></item><item><title>Inside Alibaba's Logics-Parsing-v2: end-to-end structured document parsing beyond OCR</title><link>https://ramdi.fr/github-stars/inside-alibaba-s-logics-parsing-v2-end-to-end-structured-document-parsing-beyond-ocr/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-alibaba-s-logics-parsing-v2-end-to-end-structured-document-parsing-beyond-ocr/</guid><description>Alibaba&amp;rsquo;s Logics-Parsing-v2 converts complex document images into structured HTML, handling formulas, tables, flowcharts, music sheets, and pseudocode with a single model.</description></item><item><title>Inside Second Brain: A Python AI OS with self-extending plugins and hybrid search</title><link>https://ramdi.fr/github-stars/inside-second-brain-a-python-ai-os-with-self-extending-plugins-and-hybrid-search/</link><pubDate>Mon, 04 May 2026 10:23:02 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-second-brain-a-python-ai-os-with-self-extending-plugins-and-hybrid-search/</guid><description>Second Brain is a Python framework that indexes local files with embeddings, runs background subagents, and lets AI agents build and hot-load their own plugins at runtime.</description></item><item><title>Automating bank statement processing with YOLOv8, OCR, and LLMs for personal finance analysis</title><link>https://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/</guid><description>Explore how a hybrid pipeline using YOLOv8 layout detection, OCR, and LLMs automates messy bank statement PDFs for personal finance analysis with RAG and AI agents.</description></item><item><title>deepseek_ocr_app: full-stack OCR with multi-format PDF export and real-time progress</title><link>https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/</guid><description>deepseek_ocr_app combines React and FastAPI to offer powerful OCR for images and multipage PDFs with exports to Markdown, HTML, DOCX, and JSON. It features real-time progress tracking and bounding box visualization.</description></item><item><title>DocStrange: A versatile Python library for LLM-optimized document parsing with dual-mode processing</title><link>https://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/</guid><description>DocStrange converts PDFs, DOCX, PPTX, XLSX, images, and URLs into LLM-ready Markdown, JSON, HTML, and CSV. It offers free cloud and private local GPU modes for flexible, privacy-compliant document parsing.</description></item><item><title>Windrecorder: a local-first screen recorder with multi-engine OCR indexing</title><link>https://ramdi.fr/github-stars/windrecorder-a-local-first-screen-recorder-with-multi-engine-ocr-indexing/</link><pubDate>Mon, 04 May 2026 10:23:01 +0000</pubDate><guid>https://ramdi.fr/github-stars/windrecorder-a-local-first-screen-recorder-with-multi-engine-ocr-indexing/</guid><description>Windrecorder captures screen activity on Windows, indexes it with multiple OCR engines locally, and offers a searchable rewind UI—all without cloud dependencies.</description></item><item><title>Inside Tesseract OCR: from legacy character recognition to LSTM-based line recognition</title><link>https://ramdi.fr/github-stars/inside-tesseract-ocr-from-legacy-character-recognition-to-lstm-based-line-recognition/</link><pubDate>Sun, 26 Apr 2026 09:31:26 +0000</pubDate><guid>https://ramdi.fr/github-stars/inside-tesseract-ocr-from-legacy-character-recognition-to-lstm-based-line-recognition/</guid><description>Tesseract OCR evolved from a legacy character pattern engine to a modern LSTM-based line recognition system supporting 100+ languages and multiple output formats. Here&amp;rsquo;s a technical dive.</description></item></channel></rss>