Noureddine RAMDI / Tachyon: sub-50ns cross-language IPC with zero-copy shared memory

Created Mon, 04 May 2026 10:23:01 +0000 Modified Sat, 23 May 2026 20:41:27 +0000

riyaneel/Tachyon

Tachyon is a same-machine inter-process communication (IPC) library that pushes the latency envelope down to sub-50 nanoseconds round trip across eight different programming languages. It achieves this by sidestepping serialization altogether, using a shared memory ring buffer with zero-copy semantics. The result is a cross-language IPC that is 3-5x faster than notable competitors like iceoryx or Aeron.

What Tachyon is and how it works

At its core, Tachyon implements a single-producer-single-consumer (SPSC) lock-free ring buffer in shared memory. Unlike traditional IPC mechanisms that rely on sockets, pipes, or brokers, Tachyon requires only two cooperating processes that share a memory region mapped into both address spaces. This design eliminates the overhead of copying data between user space and kernel space, and the costly serialization/deserialization steps that usually plague cross-language communication.

The project supports bindings for Python, Node.js, Java, Kotlin, Rust, Go, C#, and C++. This cross-language support is essential for heterogeneous systems, such as machine learning inference pipelines or real-time audio/video processing setups, where components written in different languages must exchange data rapidly.

A key feature is its support for DLPack, a standard tensor exchange protocol that allows zero-copy sharing of tensor data structures between frameworks like PyTorch and NumPy. This means Python processes can directly access tensors produced by a C++ process without serialization or memcpy, which is critical for low-latency ML inference chains.

Under the hood, Tachyon leverages Linux kernel features like memfd and SCM_RIGHTS for secure shared memory allocation and passing file descriptors between processes. This makes it primarily Linux-centric, with tier-2 support for macOS 13+, but no Windows support due to platform limitations.

Technical strengths and tradeoffs

Tachyon’s standout technical strength is its sub-50ns median round-trip latency (49.9 ns p50), which is an order of magnitude lower than many IPC libraries that rely on serialization or kernel-mediated transport. The lock-free ring buffer ensures minimal synchronization overhead, and the zero-copy design means data stays in shared memory without redundant copies.

Benchmarks on an i7-12650H with DDR5 memory show throughput of 15,077 K round trips per second, with p99 latency still under 103 ns when scheduled with SCHED_FIFO priority. These figures make it suitable for latency-critical domains like high-frequency trading feeds or real-time media streaming.

The codebase emphasizes minimal dependencies and performance-focused design. The reliance on modern C++ features and system calls means the code is clean but requires recent compilers (GCC 14+ or Clang 17+). The cross-language bindings are well-maintained but adding a new language requires implementing a ring buffer interface and memory mapping logic.

The tradeoff is clear: Tachyon excels in scenarios where latency is king and the deployment environment is controlled (Linux/macOS), but it is not a general-purpose IPC solution. The shared memory approach requires careful management of lifetimes, synchronization, and error handling by the users. Also, the zero-copy approach means that data structures exchanged must be compatible or marshalled outside Tachyon, limiting flexibility compared to more generic message brokers.

Quick start

The library provides straightforward installation commands for multiple languages, compiling the C++ core at install time where relevant.

Python installation:

pip install tachyon-ipc

Node.js installation:

npm install @tachyon-ipc/core

Java (Maven):

<dependency>
    <groupId>dev.tachyon-ipc</groupId>
    <artifactId>tachyon-java</artifactId>
    <version>0.4.2</version>
</dependency>

Kotlin (Gradle):

implementation("dev.tachyon-ipc:tachyon-kotlin:0.4.2")

Rust:

cargo add tachyon-ipc

Go:

go get github.com/riyaneel/tachyon/bindings/go@v0.4.2

C#:

dotnet add package TachyonIpc

C++ (CMake FetchContent):

include(FetchContent)

FetchContent_Declare(tachyon
		GIT_REPOSITORY https://github.com/riyaneel/tachyon.git
		GIT_TAG v0.4.2
)
FetchContent_GetProperties(tachyon)
if (NOT tachyon_POPULATED)
	FetchContent_Populate(tachyon)
	add_subdirectory(${tachyon_SOURCE_DIR}/core ${tachyon_BINARY_DIR}/tachyon-core)
endif ()

target_link_libraries(my_app PRIVATE tachyon)

The README provides a minimal usage example for Python, demonstrating how to spawn two processes communicating over the shared ring buffer. Users should note the OS requirements: Linux 5.10+ primarily, macOS 13+ as tier-2, and no Windows support due to POSIX-specific system calls.

verdict

Tachyon is a solid choice if you need ultra-low latency IPC across languages on Linux or macOS and your workload involves large tensor data or latency-sensitive real-time streams. Its zero-copy shared memory ring buffer design is a textbook example of squeezing every nanosecond out of IPC.

That said, the approach demands a controlled environment and some systems programming discipline. You trade off ease of use and flexibility for performance and minimal overhead. If your use case involves heterogeneous processes exchanging large buffers quickly and predictably, Tachyon is worth a serious look.

For general IPC needs, or if you must support Windows, you’ll want to consider other tools. But for ML inference pipelines, high-frequency trading, or real-time media processing on supported platforms, Tachyon’s numbers and architecture make it one of the fastest IPC solutions available today.


→ GitHub Repo: riyaneel/Tachyon ⭐ 260 · C++