Policy Training and Quadruped Control with Live PPO

Reinforcement Learning Robotics MuJoCo Python Quadruped Control
Go1 Robot

This project enables live, interactive control of a Unitree Go1 quadruped robot in simulation using pre-trained policies. The system bridges the gap between simulation and real-world deployment by allowing users to control the robot using either a gamepad controller or keyboard in real-time within the MuJoCo Playground environment.

The work builds upon Proximal Policy Optimization (PPO) techniques for robotic locomotion, creating a pipeline that transforms user-inputted controls to drive PPO-trained locomotion policies. This interactive approach enables testing and visualization of learned quadruped behaviors before deployment on physical hardware.

Project Objectives

  • Enable Live User-Inputted Control: Create a system where users can directly control the Unitree Go1 in simulation using either a gamepad controller or keyboard inputs.
  • Implement Simplified Transfer Pipeline: Design a workflow where simulation-trained policies can be tested interactively before deploying to physical robots.
  • Bridge Simulation-to-Reality Gap: Create a system that allows for testing of trained policies in a way that resembles real-world deployment conditions.

Implementation Details

  • Control Loop Architecture: Developed a live control loop using MuJoCo's passive viewer that transforms controller/keyboard inputs into velocity commands for the trained policy.
  • Policy Training: Utilized PPO (Proximal Policy Optimization) with extensive hyperparameter tuning for robust quadruped locomotion across various terrains.
  • Command Injection System: Created a pipeline where commands are stored, normalized, scaled, and injected into the observation vector at each timestep.
  • Multi-Platform Support: Implemented platform-specific optimizations for both standard systems (using Pygame) and macOS (using PySDL2).

Key Outcomes

  • Interactive Robotic Control: Successfully implemented real-time control of a simulated quadruped robot with trained policies in the MuJoCo environment.
  • Multi-Terrain Functionality: The system works on both flat and rough terrain environments, demonstrating the robustness of the trained policies.
  • Platform Flexibility: Created cross-platform solutions that work on standard systems and macOS, expanding accessibility for researchers and developers.
  • Foundation for Future Work: Established a framework that can be extended to other robotic systems and control methodologies beyond the Unitree Go1.

Go1 in Action

Go1 Yaw Right
Yaw Right
Go1 Yaw Left
Yaw Left
Go1 Walking Straight
Walk Straight
Go1 Walking Straight 2
Walk Backwards