Firmware reverse engineering on ARM Thumb2 architectures often hits a wall due to misidentified function boundaries. Tools like IDA Pro and Ghidra rely on disassembler heuristics that fail in tricky ARM Thumb2 binaries, corrupting call graphs and breaking downstream diffing tools like BinDiff or Diaphora. Polypyus takes a different route: it ignores disassembler function detection entirely and locates functions by matching binary patterns directly from known annotated firmware. This pragmatic binary-only approach sidesteps a major pain point in ARM firmware analysis.
What polypyus is and how it works
Polypyus is a Python-based firmware historian designed specifically for ARM Thumb2 binaries. Its main goal is to identify function starts and boundaries by matching byte-identical function signatures extracted from annotated history binaries against raw target firmware. Instead of relying on disassembler auto-analysis, which can misinterpret ARM Thumb2 instructions and generate incorrect function starts, Polypyus builds “fuzzy binary signatures” from known-good firmware versions.
The architecture centers around creating matchers from annotated history binaries. These annotations come from patch.elf files, .symdefs, or CSV files containing symbol information. Polypyus then scans the target binary for byte sequences that correspond exactly to known functions, effectively finding functions that other tools miss due to call graph corruption.
The repo is written in Python 3 (>=3.6), using no exotic dependencies beyond standard Python packaging tools. It provides both a CLI and GUI interface (polypyus-cli and polypyus-gui), making it flexible for integration in different reverse engineering workflows.
This approach positions Polypyus as a preprocessing step before running BinDiff or Diaphora, enabling these tools to operate on a more accurate function map. Polypyus exports its matches in a format importable by IDA, bridging the gap between raw binary matching and interactive disassembler environments.
What makes polypyus stand out: binary-only fuzzy matching and precision
The core strength of Polypyus lies in its binary-only fuzzy matching strategy. Unlike conventional RE tools that analyze control flow graphs or disassembled code, Polypyus treats the binary as raw data and looks for byte-identical function signatures derived from known history versions.
This design trades recall for precision, aiming to avoid false positives entirely. According to the published results, Polypyus found only correct matches in their experiments, achieving zero false positives — a crucial property when function misidentification can cascade into serious analysis errors.
The tradeoff is clear: Polypyus may miss some functions (lower recall), but every function it reports is reliable, making it valuable as a complementary tool in ARM Thumb2 firmware RE.
Under the hood, the codebase is surprisingly clean and focused given the complexity of the problem domain. It leverages annotated symbol data to build matchers, then scans target binaries efficiently — reportedly running in a few seconds even on large firmware blobs.
This binary-only approach also neatly sidesteps the thorny problem of ARM Thumb2’s mixed 16- and 32-bit instruction encodings, which often confuse disassemblers. By not depending on instruction decoding for function detection, Polypyus avoids these pitfalls.
The project was published at the Workshop on Binary Analysis Research (BAR) 2021, which speaks to its academic rigor and peer validation.
Quick start
Polypyus requires Python 3.6 or newer. The recommended installation method is via pip in a virtual environment:
pip install .
After installation, the following commands are available:
polypyus-gui— graphical interface for interactive usepolypyus-cli— command line interface for scripting and automation
For testing, you can install test dependencies with:
pip install '.[test]'
For development dependencies (e.g., type stubs), use:
pip install '.[development]'
This setup makes it straightforward to get Polypyus running and integrate it into your reverse engineering pipeline.
Verdict
Polypyus addresses a very specific and persistent pain point in ARM Thumb2 firmware reverse engineering: unreliable function boundary detection by disassemblers. Its binary-only fuzzy matching approach is not a replacement for full disassembly but a valuable preprocessing step that improves downstream diffing accuracy.
The tool is especially relevant for reverse engineers working with ARM Thumb2 firmware who rely on BinDiff or Diaphora for binary diffing and patch analysis. It trades off recall for precision, which is often the right choice in high-stakes firmware analysis where false positives can derail investigations.
Limitations include reduced recall and dependence on having annotated history binaries for matcher creation. It is not a general-purpose disassembler or RE platform but fits well into a pipeline aiming to overcome ARM Thumb2’s idiosyncrasies.
Overall, Polypyus is a pragmatic and well-engineered addition to the reverse engineering toolkit, worth exploring for anyone dealing with ARM Thumb2 firmware where function start detection is a bottleneck.
Related Articles
- Inside fzf: how a Go fuzzy finder processes millions of items instantly — fzf is a fast, portable command-line fuzzy finder in Go that processes millions of items instantly. This article explore
- Camoufox: a stealthy Firefox fork for AI agents and web scraping — Camoufox is a Firefox fork optimized for AI agents and web scraping with stealth fingerprint injection at the C++ level
- Pydoll: Async-native Chromium automation with typed extraction for web scraping — Pydoll is a Python library for Chromium automation using Chrome DevTools Protocol. It offers async-native APIs and Pydan
- nh: a Rust-based unified CLI for the Nix ecosystem with enhanced search and ergonomics — nh is a Rust CLI tool consolidating Nix, NixOS, and Home Manager commands with improved ergonomics, speed, and Elasticse
→ GitHub Repo: seemoo-lab/polypyus ⭐ 231 · Python