Bank statements are among the most challenging financial documents to automate processing for — messy formats, scanned images, wildly varying layouts, and complex tables make extracting structured data a headache. This repository tackles that exact problem by combining computer vision, OCR, and large language models (LLMs) into a multi-stage pipeline. The approach is practical and modular, aiming to turn raw bank statement PDFs into detailed, queryable personal finance insights.
Automated bank statement parsing with layout detection and LLM structuring
At its core, this project builds a pipeline that automates the extraction and analysis of financial data from bank statements. It starts with YOLOv8, a state-of-the-art object detection model, to detect layout elements on the bank statement pages. This includes identifying tables, text blocks, and other key regions. The detected regions are then fed into OCR (Tesseract) to extract raw text.
But rather than stop at OCR, the pipeline uses large language models to parse and structure this raw text into meaningful financial data. The LLMs are employed to interpret extracted tables and text blocks, converting them into structured formats that can be queried and analyzed.
This hybrid approach addresses the messy reality of bank statements, where formats vary widely and scanned images add noise. YOLOv8 provides robust layout detection across formats, while OCR handles text extraction, and LLMs bring contextual understanding and flexible parsing.
The repository integrates a retrieval-augmented generation (RAG) pipeline using vector stores like Chroma and Faiss. This enables natural language querying of the structured financial data. Users can ask questions about their income, expenses, and trends, and the system retrieves relevant context before generating answers.
On top of this, the project uses AG2, a multi-agent AI framework (migrated from pyautogen), to perform autonomous financial analysis. These AI agents analyze the extracted data to categorize income and expenses, identify trends, and generate predictions.
The system supports both cloud-based LLMs (Google Gemini) and local models (Ollama), providing flexibility for deployment environments. A Streamlit frontend offers an interactive UI for uploading documents and querying the data.
Development notebooks are included for experimentation and evaluation, including the use of DeepEval for assessing the quality of extraction and analysis.
Hybrid layout detection with YOLOv8 and LLM-powered extraction: technical strengths and tradeoffs
The standout technical feature is the hybrid pipeline that chains layout detection, OCR, and LLM-driven parsing. Many document parsing projects rely solely on OCR or heuristic table extraction, which often fails on complex or scanned PDFs. Here, YOLOv8’s robust layout detection acts as a precision filter, isolating key document regions before text extraction.
This reduces noise and improves the downstream quality of text fed to the LLMs. Using LLMs for table and text block parsing is a smart tradeoff — it embraces the flexibility and understanding capabilities of large models instead of brittle regex or rule-based methods.
However, this comes with some costs. Running YOLOv8 and LLMs requires GPU resources and can be slow compared to traditional parsers, especially on large document batches. The hybrid pipeline also adds complexity in orchestrating multiple ML components.
Code quality in the repo is decent, with clear separation of concerns: layout detection, OCR extraction, LLM parsing, vector indexing, and agent analysis are modularized. The use of Jupyter notebooks for development and evaluation supports iterative experimentation but may pose challenges for production deployment.
The multi-agent AG2 framework is a notable addition, enabling autonomous financial reasoning beyond simple extraction. It reflects an advanced AI orchestration pattern, although it is still under active development and integration.
Quick start for running the application
1. Clone & Setup
git clone https://github.com/johnsonhk88/AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction.git
cd AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction
# Setup virtual environment and install dependencies
./src/build-python-virual-environment.sh
./src/activate_virual_environment.sh
./src/install-requirement.sh
# Install Tesseract OCR (Ubuntu/Debian)
./src/install-pytesseract-for-linux.sh
Create a .env file with your GOOGLE_API_KEY for Gemini access.
2. Run the Application
Development notebooks for experimenting:
cd src/dev
jupyter notebook
Streamlit web UI for interaction:
cd src
streamlit run apps.py
Verdict: who should explore this repo
This project is a solid foundation for anyone interested in automating personal finance document processing with a hybrid machine learning approach. The combination of YOLOv8 layout detection, OCR, and LLM table parsing addresses the core pain point of messy, heterogeneous bank statement PDFs.
The modular architecture and inclusion of a RAG pipeline with vector stores enable natural language querying, which is increasingly valuable for personal finance applications.
However, this is not yet a turnkey production system. The use of Jupyter notebooks signals that much of the code is in research or prototype stage. GPU requirements and multi-component orchestration may complicate deployment.
If you want to explore advanced document AI pipelines combining vision and language models, or experiment with autonomous AI financial agents, this repo is worth a look. For production use, expect to invest effort in hardening, performance optimization, and UI polish.
Overall, the repo demonstrates a meaningful integration of multiple AI components to solve a real-world document parsing challenge, with honest tradeoffs around complexity and resource needs.
Related Articles
- A hands-on course for mastering large language models: fine-tuning, quantization, and tooling — Explore a comprehensive LLM course with practical notebooks on fine-tuning (QLoRA, DPO), quantization (GPTQ), and tools
- Pathway LLM App: unified pipelines for scalable retrieval-augmented generation and AI search — Pathway LLM App provides integrated pipelines for scalable RAG and AI search, combining vector and full-text indexing wi
- Inside daily_stock_analysis: a multi-LLM automated stock analysis system — daily_stock_analysis combines multi-LLM integration with multi-source financial data to automate stock market decisions
- Navigating free-tier LLM APIs with the awesome-free-llm-apis catalog — A curated catalog of free-tier LLM APIs compatible with OpenAI SDK, detailing rate limits, model specs, and providers to
- Awesome LLM Apps: a practical collection of runnable AI agent and RAG templates — Awesome LLM Apps offers 100+ runnable AI agent and RAG templates for quick LLM app development. It supports multiple pro
→ GitHub Repo: johnsonhk88/AI-Bank-Statement-Document-Automation-By-LLM-And-Personal-Finanical-Analysis-Prediction ⭐ 579 · Jupyter Notebook