AMD unveils its first small language model, AMD-135M — AI performance enhanced by speculative decoding

As AMD flexes its muscles in the AI game, it is not only introducing new hardware but is betting on software too, trying to hit new market segments not already dominated by Nvidia.

Thus, AMD hasunveiledits first small language model, AMD-135M, which belongs to the Llama family and is aimed at private business deployments. It is unclear whether the new model has to do anything with the company’s recentacquisition of Silo AI(as the deal has to be finalized and cleared by various authorities, so probably not), but this is a clear step in the direction of addressing the needs of specific customers with a pre-trained model done by AMD - using AMD hardware for inference.

AMD

The main reason why AMD’s models are fast is because they use so-called speculative decoding. Speculative decoding introduces a smaller ‘draft model’ that generates multiple candidate tokens in a single forward pass. Tokens are then passed to a larger, more accurate ‘target model’ that verifies or corrects them. On the one hand, this approach allows for multiple tokens to be generated simultaneously, yet on the other hand this comes at the cost of power due to increased data transactions.

AMD’s new release comes in two versions: AMD-Llama-135M and AMD-Llama-135M-code, each designed to optimize specific tasks by accelerating inference performance by using speculative decoding technology, a logical thing to do for a small-language model-based AI service. Somehow, both prevail in performance tests conducted by AMD.

Anton Shilov

AMD believes that further optimizations can lead to even better performance. Yet, as the company shares benchmark numbers of its previous-generation GPUs, we can only imagine what its current-generation (MI300X) and next-generation (MI325X) could do.

Get Tom’s Hardware’s best news and in-depth reviews, straight to your inbox.

Anton Shilov is a contributing writer at Tom’s Hardware. Over the past couple of decades, he has covered everything from CPUs and GPUs to supercomputers and from modern process technologies and latest fab tools to high-tech industry trends.