live control go1 - sam burns

Policy Training and Quadruped Control with Live PPO

Reinforcement Learning Robotics MuJoCo Python Quadruped Control

This project enables live, interactive control of a Unitree Go1 quadruped robot in simulation using pre-trained policies. The system bridges the gap between simulation and real-world deployment by allowing users to control the robot using either a gamepad controller or keyboard in real-time within the MuJoCo Playground environment.

The work builds upon Proximal Policy Optimization (PPO) techniques for robotic locomotion, creating a pipeline that transforms user-inputted controls to drive PPO-trained locomotion policies. This interactive approach enables testing and visualization of learned quadruped behaviors before deployment on physical hardware.

Project Objectives

Enable Live User-Inputted Control: Create a system where users can directly control the Unitree Go1 in simulation using either a gamepad controller or keyboard inputs.
Implement Simplified Transfer Pipeline: Design a workflow where simulation-trained policies can be tested interactively before deploying to physical robots.
Bridge Simulation-to-Reality Gap: Create a system that allows for testing of trained policies in a way that resembles real-world deployment conditions.

Implementation Details

Control Loop Architecture: Developed a live control loop using MuJoCo's passive viewer that transforms controller/keyboard inputs into velocity commands for the trained policy.
Policy Training: Utilized PPO (Proximal Policy Optimization) with extensive hyperparameter tuning for robust quadruped locomotion across various terrains.
Command Injection System: Created a pipeline where commands are stored, normalized, scaled, and injected into the observation vector at each timestep.
Multi-Platform Support: Implemented platform-specific optimizations for both standard systems (using Pygame) and macOS (using PySDL2).

Key Outcomes

Interactive Robotic Control: Successfully implemented real-time control of a simulated quadruped robot with trained policies in the MuJoCo environment.
Multi-Terrain Functionality: The system works on both flat and rough terrain environments, demonstrating the robustness of the trained policies.
Platform Flexibility: Created cross-platform solutions that work on standard systems and macOS, expanding accessibility for researchers and developers.
Foundation for Future Work: Established a framework that can be extended to other robotic systems and control methodologies beyond the Unitree Go1.

Go1 in Action

Yaw Right

Yaw Left

Walk Straight

Walk Backwards

Explore the Code