Inside ToddlerBot: an open-source Python platform for multi-skill humanoid locomotion with depth-based skill classification

ToddlerBot tackles a fundamental challenge in humanoid robotics: enabling a robot to autonomously select and perform multiple locomotion skills based on its perception of the environment. What sets ToddlerBot apart is its end-to-end pipeline that integrates stereo depth estimation with a skill classifier that dynamically triggers reinforcement learning policies on a physical robot. This is all done in Python, making it accessible for robotics researchers and developers who want to dive into whole-body locomotion beyond just foot placement.

what ToddlerBot is and how it works

ToddlerBot is an open-source humanoid robot platform designed for scalable policy learning and robotics research. It’s built entirely in Python (3.10+), which is notable given how many robotics stacks rely on C++ or ROS middleware. The repo provides a full suite of tools: from low-level robot control to training RL policies in simulation using MuJoCo/MJX, to skill classification based on stereo depth data, to pipelines for real-world deployment.

At the heart of the platform is a multi-skill whole-body locomotion system introduced in the 2026 “Locomotion Beyond Feet” release. Instead of manually switching between predefined behaviors, the system continuously classifies the robot’s context from egocentric stereo depth images. Based on the classification, it autonomously selects and executes the appropriate locomotion skill learned via RL.

The architecture comprises several key components:

Keyframe animation tools: These allow users to create reference motions that serve as starting points for training RL policies.
RL training scripts: These use MuJoCo/MJX simulators to train policies for each locomotion skill.
Skill classifier: Trained on depth maps derived from stereo RGB frames collected on the robot, it predicts which skill the robot should execute.
Depth estimation server: Runs in real-time on the robot, providing depth maps from stereo images continuously.
Multi-policy execution: A controller that loads multiple trained policies, receives skill predictions, and commands the robot accordingly.

The stack leverages Python’s ecosystem and is pip-installable, which lowers the barrier to entry. The use of MuJoCo/MJX simulators is essential for physics-accurate RL training, though it does introduce dependency on commercial or academic licenses.

what makes ToddlerBot’s technical approach interesting

The standout feature is the seamless integration of perception and control in a multi-skill locomotion pipeline. The robot is not just following scripted sequences or a single trained policy — it dynamically switches policies based on real-time classified sensory input.

The skill classifier uses stereo RGB cameras to collect real-world data. This raw data undergoes offline processing to create depth maps, which are then used to train a classifier that distinguishes different locomotion skills. This approach avoids reliance on external motion capture or expensive sensors, favoring a more scalable and practical solution.

Training each skill separately as an RL policy in MuJoCo/MJX is a pragmatic choice. It allows modular policy development and tuning for each locomotion behavior, which can then be composed by the classifier at runtime. This reduces the complexity compared to training a monolithic policy that must handle all skills.

The codebase is surprisingly clean and well-organized for a robotics project of this scope. The use of Python for everything from low-level control to training and deployment shows a commitment to developer experience. Utility scripts and comprehensive tests indicate attention to reliability.

However, there are tradeoffs. The reliance on MuJoCo/MJX means users must handle simulator licensing and setup. The depth estimation server and skill classifier add latency and complexity that might be challenging for very low-latency control loops. Also, real-world deployment requires hardware capable of running the depth estimation server and multiple RL policies concurrently.

Overall, this repo exemplifies how modular RL policies combined with learned perception can create a flexible and autonomous locomotion system for humanoids.

quick start to run the multi-skill locomotion system

The README provides a clear step-by-step process to get started:

Create reference motions using the provided Keyframe App. This sets the foundation for training.
Train RL policies for each skill in simulation with MuJoCo/MJX:

python toddlerbot/locomotion/train_mjx.py --gin-file <skill_name> --env <skill_name>

Collect real-world depth data and train the skill classifier:

Collect stereo RGB frames with skill labels:

python toddlerbot/skill_classifier/data/collect_real_world_skill_data.py

Process images offline into depth maps:

python toddlerbot/skill_classifier/data/create_depth_data.py

Train the classifier on the depth data:

python toddlerbot/skill_classifier/training/train.py <data_dir>

Run the full system:

Start the depth estimation server:

python toddlerbot/skill_classifier/run_foundation_stereo.py

Specify trained policy checkpoints in POLICY_CONFIGS inside run_multiple_policy.py, then run:

python toddlerbot/policies/run_multiple_policy.py --skill-classifier <classifier_checkpoint>

Once running, the system continuously classifies skills from depth data and executes the corresponding policy on the robot in real-time.

verdict: who should consider ToddlerBot

ToddlerBot is well-suited for robotics researchers and developers interested in humanoid locomotion, reinforcement learning, and perception-driven autonomous behavior. Its all-Python approach and end-to-end tooling lower the entry barrier compared to mixed-language robotics stacks.

The modular policy and skill classification pipeline offers a practical pattern to handle multi-skill locomotion, a problem often overlooked in academic RL demos that focus on single tasks.

That said, ToddlerBot is not plug-and-play for casual users. It requires MuJoCo/MJX setup, real robot hardware capable of running the depth server and policies, and a willingness to engage with data collection and training pipelines.

If you’re working on scalable robotics control systems or want a reference implementation for whole-body locomotion with perception, this repo is worth exploring. It strikes a balance between flexible research tooling and practical deployment, with honest tradeoffs around complexity and dependencies.

Hugging Face Transformers: a unified API for state-of-the-art AI models across modalities — Hugging Face Transformers offers a unified Python API to access over 1 million pretrained AI models for text, vision, an
Hands-on with YOLOv5: A practical deep dive into Ultralytics’ PyTorch vision model — YOLOv5 by Ultralytics offers an accessible, fast, and accurate PyTorch-based computer vision toolkit for object detectio

→ GitHub Repo: hshi74/toddlerbot ⭐ 676 · Python

Noureddine RAMDI / Inside ToddlerBot: an open-source Python platform for multi-skill humanoid locomotion with depth-based skill classification

what ToddlerBot is and how it works

what makes ToddlerBot’s technical approach interesting

quick start to run the multi-skill locomotion system

verdict: who should consider ToddlerBot

Related Articles