Noureddine RAMDI Dinour

Lead Developer & AI Enthusiast — Software Architecture, AI/LLM, Infrastructure Automation

Organizations

16 results for Ocr

Clear filter

Comic Translate: AI-driven multi-language comic translation with full-page context
Comic Translate uses advanced AI models and a multi-step pipeline for accurate comic translation across languages, combining speech bubble detection, OCR, and LLMs with full-page context.
github-stars python llm ocr computer-vision Created Sat, 23 May 2026 20:41:14 +0000
Dedoc: Python library for structured document content extraction with a virtual stack machine PDF engine
Dedoc is a Python library and REST API that extracts structured content from diverse documents including PDFs, Office files, and images using a unique virtual stack machine PDF interpreter and OCR preprocessing.
github-stars python document-extraction pdf ocr Created Sat, 23 May 2026 20:41:14 +0000
Inside Papermerge: an open-source OCR document management system with a scalable meta-repo architecture
Papermerge is a Python-based open-source document management system for scanned files with OCR and full-text search, using a meta-repo pattern to scale its codebase.
github-stars python ocr document-management digital-archives Created Sat, 23 May 2026 20:41:14 +0000
Nougat: Vision Transformer OCR for academic PDFs extracting LaTeX math and tables
Nougat is Meta’s neural OCR system for academic PDFs, extracting LaTeX math and tables into structured Markdown using a Vision Transformer encoder-decoder. It offers CLI, API, and training tools.
github-stars python ocr vision-transformer academic-pdfs Created Sat, 23 May 2026 20:41:14 +0000
OCRFlux: GPU-Accelerated OCR with Python for High-Performance Document Processing
OCRFlux is a Python OCR tool optimized for NVIDIA GPUs, enabling fast, high-quality OCR on documents using a conda environment and poppler-utils for PDF rendering.
github-stars python ocr gpu conda Created Sat, 23 May 2026 20:41:14 +0000
Parsing bank statements with monopoly-core: a per-bank parser approach in Python
Monopoly-core is a Python library and CLI for converting bank statement PDFs to CSV using per-bank parser classes. It supports 20+ banks, OCR, and safety checks.
github-stars python pdf-parsing cli bank-statements Created Sat, 23 May 2026 20:41:14 +0000
pdf-document-layout-analysis: a dual-model PDF layout analysis microservice with Docker deployment
pdf-document-layout-analysis is a Dockerized microservice using Vision Grid Transformer and LightGBM for PDF layout analysis, offering high accuracy or fast processing with OCR, translation, and multi-format export.
github-stars python docker ocr transformers Created Sat, 23 May 2026 20:41:14 +0000
TurboOCR: a GPU-accelerated OCR server optimized for raw pixel input and high throughput
TurboOCR is a C++/CUDA OCR server leveraging TensorRT FP16 for high throughput and low latency, featuring a zero-decode pixel pipeline and multi-protocol API.
github-stars cpp cuda ocr tensorrt Created Tue, 05 May 2026 13:37:39 +0000
Falcon-Perception: a minimal multimodal PyTorch engine for object detection, segmentation, and OCR
Falcon-Perception is a PyTorch engine for multimodal autoregressive Transformers handling detection, segmentation, and OCR with FlexAttention and efficient caching.
github-stars pytorch multimodal transformers cuda Created Mon, 04 May 2026 10:23:02 +0000
Inside Alibaba's Logics-Parsing-v2: end-to-end structured document parsing beyond OCR
Alibaba’s Logics-Parsing-v2 converts complex document images into structured HTML, handling formulas, tables, flowcharts, music sheets, and pseudocode with a single model.
github-stars python document-parsing machine-learning ocr Created Mon, 04 May 2026 10:23:02 +0000
Inside Second Brain: A Python AI OS with self-extending plugins and hybrid search
Second Brain is a Python framework that indexes local files with embeddings, runs background subagents, and lets AI agents build and hot-load their own plugins at runtime.
github-stars python ai-agent embeddings telegram-bot Created Mon, 04 May 2026 10:23:02 +0000
Automating bank statement processing with YOLOv8, OCR, and LLMs for personal finance analysis
Explore how a hybrid pipeline using YOLOv8 layout detection, OCR, and LLMs automates messy bank statement PDFs for personal finance analysis with RAG and AI agents.
github-stars python llm ocr yolov8 Created Mon, 04 May 2026 10:23:01 +0000
deepseek_ocr_app: full-stack OCR with multi-format PDF export and real-time progress
deepseek_ocr_app combines React and FastAPI to offer powerful OCR for images and multipage PDFs with exports to Markdown, HTML, DOCX, and JSON. It features real-time progress tracking and bounding box visualization.
github-stars ocr fastapi react deep-learning Created Mon, 04 May 2026 10:23:01 +0000
DocStrange: A versatile Python library for LLM-optimized document parsing with dual-mode processing
DocStrange converts PDFs, DOCX, PPTX, XLSX, images, and URLs into LLM-ready Markdown, JSON, HTML, and CSV. It offers free cloud and private local GPU modes for flexible, privacy-compliant document parsing.
github-stars python document-processing ocr llm Created Mon, 04 May 2026 10:23:01 +0000
Windrecorder: a local-first screen recorder with multi-engine OCR indexing
Windrecorder captures screen activity on Windows, indexes it with multiple OCR engines locally, and offers a searchable rewind UI—all without cloud dependencies.
github-stars python windows screen-recording ocr Created Mon, 04 May 2026 10:23:01 +0000
Inside Tesseract OCR: from legacy character recognition to LSTM-based line recognition
Tesseract OCR evolved from a legacy character pattern engine to a modern LSTM-based line recognition system supporting 100+ languages and multiple output formats. Here’s a technical dive.
github-stars ocr c++ lstm neural-networks Created Sun, 26 Apr 2026 09:31:26 +0000