Policy Training and Quadruped Control with Live PPO

This project enables live, interactive control of a Unitree Go1 quadruped robot in simulation using pre-trained policies. The system bridges the gap between simulation and real-world deployment by allowing users to control the robot using either a gamepad controller or keyboard in real-time within the MuJoCo Playground environment.
The work builds upon Proximal Policy Optimization (PPO) techniques for robotic locomotion, creating a pipeline that transforms user-inputted controls to drive PPO-trained locomotion policies. This interactive approach enables testing and visualization of learned quadruped behaviors before deployment on physical hardware.
Project Objectives
- Enable Live User-Inputted Control: Create a system where users can directly control the Unitree Go1 in simulation using either a gamepad controller or keyboard inputs.
- Implement Simplified Transfer Pipeline: Design a workflow where simulation-trained policies can be tested interactively before deploying to physical robots.
- Bridge Simulation-to-Reality Gap: Create a system that allows for testing of trained policies in a way that resembles real-world deployment conditions.
Implementation Details
- Control Loop Architecture: Developed a live control loop using MuJoCo's passive viewer that transforms controller/keyboard inputs into velocity commands for the trained policy.
- Policy Training: Utilized PPO (Proximal Policy Optimization) with extensive hyperparameter tuning for robust quadruped locomotion across various terrains.
- Command Injection System: Created a pipeline where commands are stored, normalized, scaled, and injected into the observation vector at each timestep.
- Multi-Platform Support: Implemented platform-specific optimizations for both standard systems (using Pygame) and macOS (using PySDL2).
Key Outcomes
- Interactive Robotic Control: Successfully implemented real-time control of a simulated quadruped robot with trained policies in the MuJoCo environment.
- Multi-Terrain Functionality: The system works on both flat and rough terrain environments, demonstrating the robustness of the trained policies.
- Platform Flexibility: Created cross-platform solutions that work on standard systems and macOS, expanding accessibility for researchers and developers.
- Foundation for Future Work: Established a framework that can be extended to other robotic systems and control methodologies beyond the Unitree Go1.




