Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2

Compute Blog

This article demonstrates how to accelerate CPU-based AI inference on Amazon EC2 using Intel Advanced Matrix Extensions (AMX), achieving up to 76% performance improvements through hardware acceleration and optimized precision formats.

Intel AMX accelerates matrix operations directly on CPU cores using specialized hardware
BF16 precision with AMX delivers 21-72% latency improvements at batch sizes 8+
EC2 m8i instances provide 9-14% better performance than m7i across tested models
Optimal batch sizes of 4-16 maximize AMX benefits for different model architectures
Combined m8i + BF16 AMX optimization achieves up to 76% improvement vs m7i FP32
CPU inference cost-effective for batch processing, small-medium models, variable workloads
PyTorch automatically leverages AMX with minimal code changes via environment variables
Benchmarked across six models: BigBird, DialoGPT, Gemma, DeepSeek, Llama, YOLOv5
M8i delivers up to 13% better price-performance than m7i for inference workloads

This guide enables organizations to optimize CPU-based AI inference costs while maintaining performance through Intel AMX acceleration on modern EC2 instances.

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Apr 13
2026

Optimize GPU-powered AI workloads on Amazon EC2 with IBM Turbonomic

Nov 18
2025

Accelerate large-scale AI applications with the new Amazon EC2 P6-B300 instances

Sep 2
2025

Unlocking next-generation AI performance with Dynamic Resource Allocation on Amazon EKS and Amazon EC2 P6e-GB200

Jun 23
2025

Navigating GPU Challenges: Cost Optimizing AI Workloads on AWS

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2

Related articles