Physical AI & Humanoid Robotics
From Digital Brain to Embodied Humanoid Intelligence
Welcome to the official course textbook for Physical AI & Humanoid Robotics — a university-level, 13-week program designed to take you from foundational AI theory to building and deploying autonomous humanoid robot systems.
This textbook has a built-in AI assistant powered by RAG (Retrieval-Augmented Generation). Click the chat orb in the bottom-right corner to ask questions about any topic in the book. You can also highlight any text on a page and click ✨ Ask about this to get a targeted explanation.
What You Will Learn
This course bridges the gap between AI software and physical robotic hardware. By the end of this program, you will be able to:
- Understand the principles of Physical AI and Embodied Intelligence
- Build and configure ROS 2 robotic systems using Python agents
- Simulate humanoid robots in Gazebo and Unity digital twins
- Leverage NVIDIA Isaac for AI-powered robot perception and navigation
- Create Vision-Language-Action (VLA) pipelines that accept voice commands and execute real-world tasks
Course Structure
| Week | Module | Topic |
|---|---|---|
| 1–2 | Introduction | Foundations of Physical AI, Sensor Systems |
| 3–5 | Module 1: ROS 2 | Robot Nervous System, Nodes, URDF |
| 6–7 | Module 2: Gazebo & Unity | Digital Twin, Physics Simulation |
| 8–10 | Module 3: NVIDIA Isaac | AI Brain, VSLAM, Reinforcement Learning |
| 11–13 | Module 4: VLA Capstone | Voice-to-Action, Autonomous Humanoid |
Prerequisites
- Python 3.10+ programming experience
- Basic understanding of Linux/Ubuntu command line
- Familiarity with neural networks (recommended, not required)
Hardware Options
This course supports multiple hardware configurations — from a high-end workstation with an RTX 4070 Ti to an affordable Jetson student kit (~$700). See the Hardware Requirements page for details.
Capstone Project
The final capstone project challenges you to build a fully autonomous humanoid that can:
- Receive a voice command ("Pick up the cup")
- Generate an action plan using GPT-4
- Navigate to the target using ROS 2 + Nav2
- Grasp the object using computer vision