This article dives into a new approach from UCLA researchers. They’ve figured out how to let optical processors train themselves right on the hardware, using reinforcement learning.
By skipping detailed simulations and digital models, this method helps optical systems adapt in real time to experimental quirks. That could open the door to more robust and scalable optical AI—something the field’s been chasing for a while.
Reinforcement Learning Meets Real Optical Hardware
People have usually trained optical computing systems by relying on accurate simulations or “digital twins” that try to mimic physical behavior in software. But real hardware never quite matches the model—misalignments, defects, random noise, and environmental changes creep in.
The UCLA team tackled this by creating a model-free, in situ training framework that learns straight from experimental measurements. Instead of optimizing a simulated system and hoping it works on the hardware, the diffractive optical processor itself acts as the learner.
The system tweaks its optical parameters, checks the measured output, and keeps improving through direct interaction with the real world. That’s a big shift from what most folks in the field usually do.
Why Proximal Policy Optimization?
The researchers picked proximal policy optimization (PPO) for their reinforcement learning algorithm. PPO’s known for balancing stability and efficiency—two things you really want when working with physical systems.
PPO updates its control strategy carefully, so you don’t get wild parameter swings that could mess things up. It can also reuse experimental data across several optimization steps, which means fewer physical measurements are needed—a huge plus when every data point costs time and effort in the lab.
Demonstrating In Situ Optical Learning
The UCLA team put their framework through its paces on a bunch of optical tasks. Turns out, this approach isn’t just a one-trick pony—it works across diffractive photonic systems.
Focusing Through Disorder and Distortion
In one experiment, they asked the optical processor to focus light through a random, unknown diffuser. The system had zero prior knowledge about the diffuser’s properties.
PPO managed to learn how to concentrate optical energy right where it was needed. Compared to standard policy-gradient methods, PPO explored the optical parameter space better and got to the energy concentration goal faster. That’s pretty impressive, especially given how complex and high-dimensional these problems can get.
Advanced Optical Functions
They also tried hologram generation and aberration correction—both tasks that need precise phase control. The in situ training process naturally compensated for experimental imperfections.
They reached performance levels that would be tough to match with only simulation-based optimization. That’s not something you see every day.
End-to-End Optical Intelligence
The handwritten-digit classification demo really stood out. The diffractive optical processor learned to produce clear, distinct output intensity patterns for different digits.
No Digital Post-Processing Required
All the classification happened in the optical domain. There was no downstream digital processing to clean up or interpret the results.
This shows genuine end-to-end learning on hardware. It feels like a real step toward practical optical neural networks that could process information at the speed of light.
Broader Implications for Physical AI Systems
Since the training happens directly on the device, the system naturally adapts to the messiness of real-world physics. The authors think this idea could go way beyond diffractive optics.
A General Framework for Adaptive Hardware
Any physical system that can adjust parameters and receive feedback in real time could benefit from this approach. Some examples include:
This reinforcement-learning framework skips the need for perfect simulations. As a result, developers can move faster and build more robust optical technologies for the future.
Published in Light: Science & Applications (2026), the work led by Aydogan Ozcan and collaborators feels like a real milestone for self-learning physical systems. It’s exciting to think about where this could lead.
Here is the source article for this story: Reinforcement learning accelerates model-free training of optical AI systems