AI researchers run AI chatbots at a lightbulb-esque 13 watts with no performance loss — stripping matrix multiplication from LLMs yields massive gains

Aresearch paper from UC Santa Cruzand accompanyingwriteupdiscussing how AI researchers found a way to run modern, billion-parameter-scale LLMs on just 13 watts of power. That’s about the same as a 100W-equivalent LED bulb, but more importantly, its about 50 times more efficient than the 700W of power that’s needed by data center GPUs like theNvidia H100andH200, never mind the upcomingBlackwell B200that can use up to 1200W per GPU.The work was done using custom FGPA hardware, but the researchers clarify that (most) of their efficiency gains can be applied through open-source software and tweaking of existing setups. Most of the gains come from the removal of matrix multiplication (MatMul) from the LLM training and inference processes.How was MatMul removed from a neural network while maintaining the same performance and accuracy? The researchers combined two methods. First, they converted the numeric system to a “ternary” system using -1, 0, and 1. This makes computation possible with summing rather than multiplying numbers. They then introduced time-based computation to the equation, giving the network an effective “memory” to allow it to perform even faster with fewer operations being run.The mainstream model that the researchers used as a reference point is Meta’s LLaMa LLM. The endeavor was inspired by aMicrosoftpaper on using ternary numbers in neural networks, though Microsoft did not go as far as removing matrix multiplication or open-sourcing their model like the UC Santa Cruz researchers did.It boils down to an optimization problem. Rui-Jie Zhu, one of the graduate students working on the paper, says, “We replaced the expensive operation with cheaper operations.” Whether the approach can be universally applied to AI and LLM solutions remains to be seen, but if viable it has the potential to radically alter the AI landscape.We’ve witnessed a seemingly insatiable desire for power from leading AI companies over the past year. This research suggests that much of this has been a race to be first while using inefficient processing methods. We’ve heard comments from reputable figures like Arm’s CEO warning that AI power demands continuing to increase at current rates would consumeone fourth of the United States' powerby 2030. Cutting power use down to 1/50 of the current amount would represent a massive improvement.Here’s hoping Meta, OpenAI, Google, Nvidia, and all the other major players will find ways to leverage this open-source breakthrough. Faster and far more efficient processing of AI workloads would bring us closer to human brain levels of functionality — a brain gets by with approximately 0.3 kWh of power per day by some estimates, or 1/56 of what an Nvidia H100 requires. Of course, many LLMs require tens of thousands of such GPUs and months of training, so our gray matter isn’t quite outdated just yet.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

LED lightbulbs, which usually consume about 10 Watts of power a piece.

Christopher Harper has been a successful freelance tech writer specializing in PC hardware and gaming since 2015, and ghostwrote for various B2B clients in High School before that. Outside of work, Christopher is best known to friends and rivals as an active competitive player in various eSports (particularly fighting games and arena shooters) and a purveyor of music ranging from Jimi Hendrix to Killer Mike to the Sonic Adventure 2 soundtrack.

Christopher Harper