Observing AI agents at scale with opsrobot: a Vector-based telemetry pipeline for OpenClaw workflows

Observability for AI agent workflows is a growing need as multi-agent systems become more complex and opaque. opsrobot-ai/opsrobot addresses this by building a full-stack platform that captures detailed telemetry from OpenClaw AI agents using a Vector-based data pipeline. The clever part is how it converts raw JSON logs from running agents into structured observability data without intrusive changes to the agent runtime. This approach offers 24/7 transparency into agent sessions, enterprise-grade security controls, and cost accounting tied to LLM token consumption.

How opsrobot structures observability for AI agent workflows

At its core, opsrobot is built around OpenClaw, an extensible AI agent framework. The platform uses a combination of modern observability standards and storage tech: the OpenTelemetry (OTel) protocol for tracing, eBPF for capturing system-level events, and Apache Doris as an OLAP database for analytical querying.

The architecture is a classic full-stack web app with a React/Vite frontend dashboard and a Node.js backend serving REST APIs on port 8787. Data flows from OpenClaw agent logs, which are JSON or JSONL files, through Vector collectors — a high-performance observability data pipeline tool — into Apache Doris.

Vector acts as a bridge and transformer, parsing raw log files into structured streams that Doris can ingest efficiently. This design means the OpenClaw agents themselves remain unmodified; the telemetry data is gathered and processed externally, which reduces runtime overhead and complexity.

The backend handles API requests from the frontend dashboard, delivering insights such as session traces, audit logs, gateway logs, and cost metrics. This layered approach supports three major capabilities:

Execution transparency: Real-time and historical monitoring of AI agent sessions, from pre-event conditions through post-event analysis.
Security controls: Authorization, compliance validation, and audit trails built into the platform.
Cost accounting: Mapping token usage from LLM calls to business ROI metrics.

This combination of observability and governance is important for enterprises running AI workflows at scale.

The Vector-driven data pipeline: transforming raw logs into actionable observability

What distinguishes opsrobot is its data pipeline architecture based on Vector. Most observability setups require agents or instrumented SDKs to emit telemetry, but OpenClaw agents produce local JSON log files. Vector collects these logs by tailing files or running shell commands to read session data.

The Vector configuration (vector.yaml) defines multiple sinks targeting Doris HTTP stream load endpoints, such as session_to_doris, session_logs_to_doris, and audit_logs_to_doris. Each sink points to a REST API on the backend that accepts the structured observability data.

On the source side, Vector monitors the OpenClaw logs directory, running shell commands to concatenate or flush JSON objects into streams. This allows real-time ingestion without modifying agent code or runtime behavior.

Under the hood, Vector can parse and transform JSON/JSONL data, enrich it if needed, and handle high throughput with low resource consumption. This is critical for large-scale AI deployments where telemetry volume can be massive.

The backend and Doris OLAP database then provide powerful querying and aggregation capabilities, enabling detailed tracing of multi-agent sessions, cost analyses, and compliance reporting.

The tradeoff here is the operational complexity: deploying and configuring Vector collectors on all hosts running OpenClaw agents requires care. Also, the system relies on external components (Doris, Vector, Node.js backend), which increases the deployment footprint.

Quick start with opsrobot

The project includes a detailed quickstart using Docker Compose and Node.js 18+.

# 1. Environment requirements
# - Docker Desktop with Docker Compose plugin
# - Node.js 18+

# 2. Clone the project
 git clone https://github.com/opsrobot-ai/opsrobot.git
 cd opsrobot

# 3. Deploy backend services
 docker compose -f docker-compose.yml up -d

# Then access the dashboard at http://localhost:3000

Vector needs to be installed and configured on each machine running OpenClaw to collect logs and forward them to the backend APIs.

For macOS:

brew tap vectordotdev/brew && brew install vector

For CentOS and Ubuntu, the README provides curl commands to install Vector and configures vector.yaml to point to the appropriate Doris ingestion endpoints.

This makes opsrobot practical to spin up in a real environment, though the multiple components and configuration steps require operational know-how.

verdict

opsrobot is a solid example of building production-grade observability for AI agent workflows without embedding telemetry directly in the agents. The Vector-based data pipeline is the centerpiece, enabling efficient transformation of raw JSON logs into structured analytics data.

The project suits teams running complex OpenClaw-based multi-agent systems who need deep visibility into execution, security, and cost metrics. Its use of modern technologies like eBPF, OTel, Apache Doris, and Vector shows careful engineering choices suited for scalability.

The main limitation is the operational complexity—deploying and managing multiple services and configuring Vector collectors on all agent hosts is not trivial. But for organizations needing enterprise-grade observability of AI workflows, opsrobot provides a practical, extensible foundation.

If you want to understand how to externalize observability from AI agents and build a scalable telemetry pipeline, opsrobot is worth a close look.

OpenHands: Modular architecture for flexible AI agent development — OpenHands offers a modular Python platform to build and deploy AI agents with SDK, CLI, GUI, and cloud options. It suppo
Open Design: repurposing coding-agent CLIs into a modular local-first design engine — Open Design turns 12 coding-agent CLIs into a deterministic design engine with 31 composable skills and 72+ design syste

→ GitHub Repo: opsrobot-ai/opsrobot ⭐ 129 · JavaScript

Noureddine RAMDI / Observing AI agents at scale with opsrobot: a Vector-based telemetry pipeline for OpenClaw workflows

How opsrobot structures observability for AI agent workflows

The Vector-driven data pipeline: transforming raw logs into actionable observability

Quick start with opsrobot

verdict

Related Articles