Ocr on Noureddine RAMDI

Ocr on Noureddine RAMDIhttps://ramdi.fr/tags/ocr/Recent content in Ocr on Noureddine RAMDIHugoenSat, 23 May 2026 20:41:27 +0000Comic Translate: AI-driven multi-language comic translation with full-page contexthttps://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/comic-translate-ai-driven-multi-language-comic-translation-with-full-page-context/Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.Dedoc: Python library for structured document content extraction with a virtual stack machine PDF enginehttps://ramdi.fr/github-stars/dedoc-python-library-for-structured-document-content-extraction-with-a-virtual-stack-machine-pdf-engine/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/dedoc-python-library-for-structured-document-content-extraction-with-a-virtual-stack-machine-pdf-engine/Dedoc is a Python library and REST API that extracts structured content from diverse documents including PDFs, Office files, and images using a unique virtual stack machine PDF interpreter and OCR preprocessing.Inside Papermerge: an open-source OCR document management system with a scalable meta-repo architecturehttps://ramdi.fr/github-stars/inside-papermerge-an-open-source-ocr-document-management-system-with-a-scalable-meta-repo-architecture/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/inside-papermerge-an-open-source-ocr-document-management-system-with-a-scalable-meta-repo-architecture/Papermerge is a Python-based open-source document management system for scanned files with OCR and full-text search, using a meta-repo pattern to scale its codebase.Nougat: Vision Transformer OCR for academic PDFs extracting LaTeX math and tableshttps://ramdi.fr/github-stars/nougat-vision-transformer-ocr-for-academic-pdfs-extracting-latex-math-and-tables/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/nougat-vision-transformer-ocr-for-academic-pdfs-extracting-latex-math-and-tables/Nougat is Meta’s neural OCR system for academic PDFs, extracting LaTeX math and tables into structured Markdown using a Vision Transformer encoder-decoder. It offers CLI, API, and training tools.OCRFlux: GPU-Accelerated OCR with Python for High-Performance Document Processinghttps://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/ocrflux-gpu-accelerated-ocr-with-python-for-high-performance-document-processing/OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.Parsing bank statements with monopoly-core: a per-bank parser approach in Pythonhttps://ramdi.fr/github-stars/parsing-bank-statements-with-monopoly-core-a-per-bank-parser-approach-in-python/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/parsing-bank-statements-with-monopoly-core-a-per-bank-parser-approach-in-python/Monopoly-core is a Python library and CLI for converting bank statement PDFs to CSV using per-bank parser classes. It supports 20+ banks, OCR, and safety checks.pdf-document-layout-analysis: a dual-model PDF layout analysis microservice with Docker deploymenthttps://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/Sat, 23 May 2026 20:41:14 +0000https://ramdi.fr/github-stars/pdf-document-layout-analysis-a-dual-model-pdf-layout-analysis-microservice-with-docker-deployment/pdf-document-layout-analysis is a Dockerized microservice using Vision Grid Transformer and LightGBM for PDF layout analysis, offering high accuracy or fast processing with OCR, translation, and multi-format export.TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughputhttps://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/Tue, 05 May 2026 13:37:39 +0000https://ramdi.fr/github-stars/turboocr-a-gpu-accelerated-ocr-server-optimized-for-raw-pixel-input-and-high-throughput/TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCRhttps://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/falcon-perception-a-minimal-multimodal-pytorch-engine-for-object-detection-segmentation-and-ocr/Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.Inside Alibaba's Logics-Parsing-v2: end-to-end structured document parsing beyond OCRhttps://ramdi.fr/github-stars/inside-alibaba-s-logics-parsing-v2-end-to-end-structured-document-parsing-beyond-ocr/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/inside-alibaba-s-logics-parsing-v2-end-to-end-structured-document-parsing-beyond-ocr/Alibaba’s Logics-Parsing-v2 converts complex document images into structured HTML, handling formulas, tables, flowcharts, music sheets, and pseudocode with a single model.Inside Second Brain: A Python AI OS with self-extending plugins and hybrid searchhttps://ramdi.fr/github-stars/inside-second-brain-a-python-ai-os-with-self-extending-plugins-and-hybrid-search/Mon, 04 May 2026 10:23:02 +0000https://ramdi.fr/github-stars/inside-second-brain-a-python-ai-os-with-self-extending-plugins-and-hybrid-search/Second Brain is a Python framework that indexes local files with embeddings, runs background subagents, and lets AI agents build and hot-load their own plugins at runtime.Automating bank statement processing with YOLOv8, OCR, and LLMs for personal finance analysishttps://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/automating-bank-statement-processing-with-yolov8-ocr-and-llms-for-personal-finance-analysis/Explore how a hybrid pipeline using YOLOv8 layout detection, OCR, and LLMs automates messy bank statement PDFs for personal finance analysis with RAG and AI agents.deepseek_ocr_app: full-stack OCR with multi-format PDF export and real-time progresshttps://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/deepseek-ocr-app-full-stack-ocr-with-multi-format-pdf-export-and-real-time-progress/deepseek_ocr_app combines React and FastAPI to offer powerful OCR for images and multipage PDFs with exports to Markdown, HTML, DOCX, and JSON. It features real-time progress tracking and bounding box visualization.DocStrange: A versatile Python library for LLM-optimized document parsing with dual-mode processinghttps://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/docstrange-a-versatile-python-library-for-llm-optimized-document-parsing-with-dual-mode-processing/DocStrange converts PDFs, DOCX, PPTX, XLSX, images, and URLs into LLM-ready Markdown, JSON, HTML, and CSV. It offers free cloud and private local GPU modes for flexible, privacy-compliant document parsing.Windrecorder: a local-first screen recorder with multi-engine OCR indexinghttps://ramdi.fr/github-stars/windrecorder-a-local-first-screen-recorder-with-multi-engine-ocr-indexing/Mon, 04 May 2026 10:23:01 +0000https://ramdi.fr/github-stars/windrecorder-a-local-first-screen-recorder-with-multi-engine-ocr-indexing/Windrecorder captures screen activity on Windows, indexes it with multiple OCR engines locally, and offers a searchable rewind UI—all without cloud dependencies.Inside Tesseract OCR: from legacy character recognition to LSTM-based line recognitionhttps://ramdi.fr/github-stars/inside-tesseract-ocr-from-legacy-character-recognition-to-lstm-based-line-recognition/Sun, 26 Apr 2026 09:31:26 +0000https://ramdi.fr/github-stars/inside-tesseract-ocr-from-legacy-character-recognition-to-lstm-based-line-recognition/Tesseract OCR evolved from a legacy character pattern engine to a modern LSTM-based line recognition system supporting 100+ languages and multiple output formats. Here’s a technical dive.