OASIS: a Python CLI for AI-driven code vulnerability scanning with deterministic validation

OASIS tackles a real pain point in security auditing: how to reliably detect and validate code vulnerabilities using AI without drowning in false positives. It orchestrates multiple large language models (LLMs) through a LangGraph pipeline, combining lightweight initial scans with heavier deep analyses, then runs a deterministic validation agent that produces citation-backed exploitability verdicts with confidence scores. This blend of deterministic logic and LLM-generated narrative constrained by guardrails is a practical pattern for AI security tooling.

What OASIS does and how it works

OASIS is a Python-based command-line interface tool designed for security auditing of codebases. It uses LangGraph, a multi-agent orchestration framework, to coordinate several Ollama LLMs in a two-phase scanning strategy. The first phase employs lightweight, smaller models for quick vulnerability discovery. Findings from this phase are then analyzed in-depth by larger, heavier models during the second phase to validate and refine results.

A standout architectural feature is the deterministic finding validation agent. Instead of relying solely on probabilistic LLM outputs, this agent runs code-driven investigations on selected findings, producing verdicts on exploitability that are backed by citations and confidence scores. It also ensures that any optional LLM-generated narrative describing the findings remains consistent with the deterministic results, reducing false positives and improving trust in the tool’s output.

The tool features a dual-layer caching mechanism: one layer caches embeddings to speed up repeated semantic lookups, and another caches scan results to avoid redundant analysis during iterative runs. This helps manage the computational cost of dealing with large codebases and multiple LLM queries.

OASIS includes a password-protected web dashboard that provides a retrieval-augmented generation (RAG) powered assistant. This interface allows users to interact with scan results, ask questions, and get explanations contextualized by the codebase and findings.

Reports are exported in a canonical JSON format, with derived HTML, PDF, and Markdown versions for easy sharing and review. The system supports incremental reporting with live progress updates, which is important for long-running scans on large projects.

The architecture relies heavily on Python, Ollama for running local LLMs, and LangGraph for orchestrating multi-agent workflows. It expects substantial hardware resources, especially for larger models and bigger codebases.

Technical strengths and tradeoffs in OASIS

What sets OASIS apart is the hybrid approach of combining deterministic validation with LLM-driven scanning. Most AI-powered security tools rely heavily on LLM outputs which can be noisy or overly imaginative. By constraining the narrative to be consistent with deterministic checks, OASIS strikes a balance that reduces false positives without sacrificing the insight depth that LLMs provide.

The two-phase scanning strategy is another practical design choice. Lightweight models quickly surface potential issues, reducing the scope for heavier model analysis, which is costlier and slower. This staged approach optimizes resource usage while maintaining thoroughness.

The dual caching layers address a common bottleneck in AI-driven pipelines: repeated computation. By caching embeddings and scan results separately, OASIS avoids redundant LLM calls, which can be expensive and slow, especially without a GPU.

The password-protected web dashboard with RAG-powered assistant adds a layer of interactive DX that many CLI-only tools lack. Users can query findings in natural language, backed by relevant code context and prior analysis, improving comprehension and actionability.

However, the tradeoffs are clear. The hardware requirements are high: at least 4 CPU cores and 16 GB RAM minimum, with 32 GB recommended. For small projects around 100k lines of code, a high-end CPU, 64 GB RAM, and a dedicated GPU become essential. Model downloads alone can be several gigabytes each. Enterprise users are expected to run this on servers with 128 GB+ RAM and NVIDIA A100/H100 GPUs.

The reliance on Ollama models means initial setup and model management overhead, and potentially slower CPU-only inference. The tool is not lightweight and is best suited for serious security audits rather than quick checks.

The code is surprisingly clean for a multi-agent orchestrated system, with clear separation between scanning, validation, caching, and reporting components. The LangGraph pipeline defines the flow explicitly, aiding maintainability and extensibility.

Quick start with OASIS

Prerequisites

Python 3.9+
Ollama installed and running; you must pull the models you need before scanning.
pipx (recommended CLI install):

Docker (optional)

From the repository root, assuming Ollama runs on the host:

docker compose build
docker compose run --rm oasis -i /work/test_files -ol http://host.docker.internal:11434

Code is mounted at /work; use -i paths under /work. Additional options, including bundled Ollama, dashboard, and docker run variants, are detailed in the project’s Run with Docker documentation.

Hardware requirements summary

Minimum: 4+ CPU cores, 16 GB RAM, 100 GB+ storage
Recommended: 8+ cores, 32-64 GB RAM, SSD storage, NVIDIA GPU with 8+ GB VRAM
Scaling: For 100k+ lines of code, high-end CPU, 64+ GB RAM, and dedicated GPU are essential
Model sizes dictate GPU VRAM: 4-8B params need 8 GB, 12-20B need 16 GB, 30B+ need 24 GB+ VRAM

This setup is not trivial but aligns with the resource needs of current LLM-based security tooling.

verdict

OASIS is a serious tool for security engineers and researchers who want to integrate AI-driven vulnerability scanning with deterministic, citation-backed validation to reduce noise and false positives. Its multi-agent LangGraph orchestration and two-phase scanning approach are pragmatic for balancing thoroughness with resource efficiency.

The hardware requirements and setup complexity mean it’s best suited for medium to large codebases and teams ready to invest in the infrastructure. Smaller projects or those seeking lightweight tools might find it overkill.

Its deterministic validation agent is a noteworthy pattern for future AI security tools, demonstrating how to combine code-driven logic with LLM narratives under guardrails.

If you’re exploring AI-assisted security audits and have the resources to run large models locally or via Ollama, OASIS is worth a close look.

Ollama: a unified CLI and API platform for local large language models — Ollama simplifies running and managing open-source large language models locally with a unified CLI and REST API, suppor
LlamaFactory: modular, extensible fine-tuning framework for large language models — LlamaFactory offers a modular Python framework for fine-tuning 100+ LLMs with diverse algorithms and optimizations, incl
nh: a Rust-based unified CLI for the Nix ecosystem with enhanced search and ergonomics — nh is a Rust CLI tool consolidating Nix, NixOS, and Home Manager commands with improved ergonomics, speed, and Elasticse
vLLM: Efficient large language model serving with paged attention and continuous batching — vLLM is a Python library for high-throughput LLM inference using paged attention and continuous batching. It supports qu

→ GitHub Repo: psyray/oasis ⭐ 310 · Python

Noureddine RAMDI / OASIS: a Python CLI for AI-driven code vulnerability scanning with deterministic validation