Policy Training and Quadruped Control with Live PPO
Reinforcement Learning
Robotics
MuJoCo
Python
Quadruped Control
This project enables live, interactive control of a Unitree Go1
quadruped robot in simulation using pre-trained policies. The
system bridges the gap between simulation and real-world
deployment by allowing users to control the robot using either a
gamepad controller or keyboard in real-time within the MuJoCo
Playground environment.
The work builds upon Proximal Policy Optimization (PPO) techniques
for robotic locomotion, creating a pipeline that transforms
user-inputted controls to drive PPO-trained locomotion policies.
This interactive approach enables testing and visualization of
learned quadruped behaviors before deployment on physical
hardware.
Project Objectives
-
Enable Live User-Inputted Control: Create a
system where users can directly control the Unitree Go1 in
simulation using either a gamepad controller or keyboard inputs.
-
Implement Simplified Transfer Pipeline: Design
a workflow where simulation-trained policies can be tested
interactively before deploying to physical robots.
-
Bridge Simulation-to-Reality Gap: Create a
system that allows for testing of trained policies in a way that
resembles real-world deployment conditions.
Implementation Details
-
Control Loop Architecture: Developed a live
control loop using MuJoCo's passive viewer that transforms
controller/keyboard inputs into velocity commands for the
trained policy.
-
Policy Training: Utilized PPO (Proximal Policy
Optimization) with extensive hyperparameter tuning for robust
quadruped locomotion across various terrains.
-
Command Injection System: Created a pipeline
where commands are stored, normalized, scaled, and injected into
the observation vector at each timestep.
-
Multi-Platform Support: Implemented
platform-specific optimizations for both standard systems (using
Pygame) and macOS (using PySDL2).
Key Outcomes
-
Interactive Robotic Control: Successfully
implemented real-time control of a simulated quadruped robot
with trained policies in the MuJoCo environment.
-
Multi-Terrain Functionality: The system works
on both flat and rough terrain environments, demonstrating the
robustness of the trained policies.
-
Platform Flexibility: Created cross-platform
solutions that work on standard systems and macOS, expanding
accessibility for researchers and developers.
-
Foundation for Future Work: Established a
framework that can be extended to other robotic systems and
control methodologies beyond the Unitree Go1.
Go1 in Action
Yaw Right
Yaw Left
Walk Straight
Walk Backwards