Boxer3D tackles a challenging problem: running real-time 3D object detection entirely on an iPhone with LiDAR. It combines 2D object detection and 3D lifting into oriented bounding boxes, all accelerated on-device using Apple’s Neural Engine and Metal.
what boxer3d does: native ios 3d object detection with lidar and deep learning
At its core, Boxer3D is a Swift app designed to perform 3D object detection using the iPhone’s LiDAR sensor. The pipeline begins with YOLO11n, a lightweight 2D detection model trained on COCO classes, which processes RGB input at 640×640 resolution to detect objects in 2D.
The novelty lies in the subsequent lifting of these 2D detections into 3D oriented bounding boxes (7 degrees of freedom: center (x,y,z), size (w,h,d), and yaw angle) using BoxerNet. BoxerNet is a Meta Research model that fuses visual features from DINOv3 with LiDAR depth information aggregated over 16×16 patches. This fusion helps estimate the 3D pose and size of objects reliably.
Both models are exported to ONNX format and run through ONNX Runtime, which uses the CoreML Execution Provider to tap into Metal and the Neural Engine for acceleration. This setup keeps all inference on-device, which is key for real-time performance and privacy.
ARKit provides essential spatial context by supplying camera pose, intrinsics, and gravity vector, while SceneKit renders the output as wireframe boxes anchored accurately to the real world. This integration ensures the detections are spatially consistent and visually aligned.
The app requires an iPhone 12 Pro or later because of the LiDAR sensor, and the combined model size (~450 MB) is non-trivial but manageable for modern devices.
why boxer3d’s approach stands out: 3d lifting with on-device acceleration and lidar fusion
What sets Boxer3D apart is the practical pipeline that lifts 2D detections into 3D bounding boxes using a dedicated deep model that fuses high-level visual features with LiDAR depth patches. This approach goes beyond typical 2D detection apps by integrating precise spatial depth data, which is critical for accurate 3D localization.
The choice of YOLO11n as the 2D detector is a tradeoff balancing accuracy, model size (10 MB), and input resolution (640×640). YOLO11n is small enough to run efficiently on-device while still handling 80 COCO classes.
BoxerNet, at 391 MB, is the heavier component. It takes as input the RGB image at a higher resolution (960×960), a depth tensor representing median depth per 16×16 patch, detected 2D boxes, and ray information from ARKit. It outputs the 3D box parameters with orientation and confidence scores.
Running both models through ONNX Runtime with the CoreML Execution Provider is a practical choice. It enables leveraging Apple’s hardware acceleration without rewriting models in CoreML format or building custom Metal kernels. The integration with ARKit and SceneKit for pose and visualization shows attention to a full-stack experience.
However, this setup has limitations. The model storage demands around 450 MB, which is significant for a mobile app. Also, the reliance on the LiDAR sensor restricts the app to newer iPhones (12 Pro and up). The inference speed and power consumption under real-time conditions are not detailed but would be important for production use.
The codebase is predominantly Swift, with dependencies managed via SPM (Swift Package Manager) for ONNX Runtime. The architecture cleanly separates detection, 3D lifting, and rendering, which should make customization or experimentation straightforward.
quick start: running boxer3d on your iphone
The README provides clear steps to get started:
- Clone the repo:
git clone git@github.com:Barath19/Boxer3D.git
cd Boxer3D
- Download the models from Hugging Face:
pip install huggingface_hub
huggingface-cli download Barath/boxer3d --local-dir boxer/
This places BoxerNet.onnx (~391 MB) and yolo11n.onnx (~10 MB) inside the boxer/ directory.
- Open the Xcode project:
open boxer.xcodeproj
Xcode will resolve the ONNX Runtime dependency automatically.
- Build and run on an iPhone 12 Pro or later (with iOS 16+): Use Cmd+R in Xcode.
These instructions are straightforward for iOS developers familiar with Xcode and model deployment. The dependency on Python and Hugging Face CLI for model download is a minor extra step but standard for model-heavy projects.
verdict: who should explore boxer3d
Boxer3D is a solid example of bringing research-grade 3D object detection to iOS devices with LiDAR. It’s relevant for mobile AI researchers, AR developers, and anyone interested in on-device 3D vision pipelines. The combination of YOLO11n and BoxerNet with ONNX Runtime acceleration shows a practical path to real-time 3D detection without offloading compute.
The tradeoffs are clear: it requires newer hardware with LiDAR and consumes substantial storage. For production use, evaluation of inference speed and power impact is needed, but as a research and demo project, it’s impressive.
If you are building AR apps that need spatially aware object detection or experimenting with LiDAR and 3D vision on iOS, Boxer3D offers a valuable reference implementation with clean integration of detection, 3D lifting, and rendering.
# Clone
git clone git@github.com:Barath19/Boxer3D.git
cd Boxer3D
# Download models
pip install huggingface_hub
huggingface-cli download Barath/boxer3d --local-dir boxer/
# Open and build
open boxer.xcodeproj
# Build and Run on iPhone
The codebase is worth exploring for anyone interested in ONNX Runtime on iOS, neural engine acceleration, and combining deep learning with ARKit for spatial understanding.
→ GitHub Repo: Barath19/Boxer3D ⭐ 371 · Swift