Rishav Ganguly

PPO from Scratch: The Intuition Behind Clipped Policy Optimization

Wed, 13 May 2026 18:30:00 GMT

PPO from Scratch: The Intuition Behind Clipped Policy Optimization

Proximal Policy Optimization, or PPO, is one of the most widely used policy gradient algorithms.

The main idea is:

Improve the policy, but do not let it change too much in one update.

Policy gradient objective

A policy gradient method tries to maximize expected return:

The policy gradient theorem gives an update direction:

Here, is the advantage estimate.

Clipped objective

PPO uses:

The clipping prevents the policy from changing too aggressively.

ACT from Scratch: Why Robot Policies Predict Action Chunks

Tue, 12 May 2026 18:30:00 GMT

ACT from Scratch: Why Robot Policies Predict Action Chunks

Action Chunking Transformer, or ACT, is an imitation learning architecture for robot manipulation.

The key idea is simple:

Instead of predicting only the next action, predict a short sequence of future actions.

This sequence is called an action chunk.

Why single-step behavior cloning can fail

In standard behavior cloning, the policy learns:

Given the current observation , the policy predicts one action .

The problem is that small mistakes accumulate. If the robot makes a slight error, it may enter a state that was rare in the training data. Then the policy becomes less reliable.

This is called compounding error.

Action chunking

ACT predicts:

So the model outputs multiple future actions:

This helps because the model learns short-horizon motion structure instead of isolated actions.

Why I Am Starting a Robot Learning Research Blog

Mon, 11 May 2026 18:30:00 GMT

Why I Am Starting a Robot Learning Research Blog

I am starting this blog as a public research notebook.

My goal is simple: whenever I read a paper, implement an algorithm, debug a robotics setup, or run an experiment, I want to document the process clearly.

Why write publicly?

Research ideas become much clearer when they are written down. A blog forces me to answer questions like:

What problem is this paper actually solving?
What is the core mathematical idea?
What assumptions does the method make?
What happens when I try to implement it?
What fails in practice?

What I will write about

This blog will focus on:

Robot learning papers
Reinforcement learning algorithms
Imitation learning systems
Vision-language-action models
Real robot data collection
Engineering lessons from implementation

My intended style

I want each post to move from intuition to equations to code.

The rough structure will be:

motivation -> concept -> math -> implementation -> experiments -> lessons

Long-term goal

The long-term goal is to build a record of my learning and research progress in robot learning, especially for long-horizon manipulation and generalist robot policies.