Insights

A first for optical compute: running billion-parameter LLMs in real time

April 27, 2026

Iris Nova, the first in the new Lumai Iris family of servers announced today, is the world's first optical computing system to run billion-parameter large language models in real time.

For anyone tracking the limits of AI infrastructure, that sentence is worth pausing on. Until now, optical compute has lived mostly in research papers and lab demos. Iris Nova takes it into a server, running Llama 8B and 70B workloads on data center hardware.

Why this milestone matters

AI has moved into the inference era, and this shift has exposed two hard limits on performance that have to be addressed.

The first is energy. The International Energy Agency expects global data center power demand to double by 2030. Data centers are already running into the ceiling of what grids and sites can supply.

The second is silicon itself. Each new generation of GPUs delivers smaller scaling gains while demanding significantly more power and cost. The physics of silicon is no longer keeping pace with the demands of AI inference.

Iris Nova is a working demonstration that there is another path. This view is shared by ARIA, the UK government-backed agency funding transformative research. As Suraj Bramhavar, Program Director at ARIA, put it: "The demands on existing AI processors necessitate an urgent search for alternative scaling pathways. Lumai is leading the charge in demonstrating that optical processors could provide one such pathway."

How optical compute changes the equation

Lumai's technology, born from years of research at the University of Oxford, performs the core mathematical operations of AI inference using light rather than electrons. By operating in three-dimensional volume instead of the two-dimensional plane of a chip, an optical system can execute millions of operations simultaneously through massive spatial parallelism.

In practical terms, this translates into three things customers care about:

  • Faster inference on compute-bound workloads
  • Higher execution efficiency per server
  • Up to 90% lower energy consumption than conventional architectures

Iris Nova combines an optical tensor engine, which handles the heavy matrix computation, with digital processing for system control and software. That hybrid design is deliberate: Iris Nova drops into existing data centers, and its lower power draw and reduced thermal load ease the demand on cooling and supporting infrastructure – translating directly into lower total cost of ownership.

It is particularly well-suited to the profile stage of disaggregated inference – the compute-bound phase where long contexts get processed at scale. Hyperscalers and frontier labs are increasingly separating prefill from decode because the two stages have very different demands: prefill is compute-bound and benefits from raw throughput, while decode is memory-bound and benefits from fast memory bandwidth. Running them on the same hardware means compromising on both. Optical compute solves the challenge of compute-bound prefill.

What's available, and what comes next

Iris Nova is available today for evaluation by hyperscalers, neo-clouds, enterprises, and research institutions. It's the first in a family that will include Aura and Tetra, each extending performance and efficiency further and supporting broader deployment.

As our CEO Dr. Xianxin Guo put it: "As the industry transitions into the inference era, we are simultaneously crossing the threshold into the post-silicon era. By shifting the computation paradigm from electrons to photons, Lumai can deliver an order-of-magnitude increase in performance with significant energy savings."

The age of optical AI has begun. And we are just getting started.

To learn more or request an evaluation of Iris Nova, visit lumai.ai/eval.