Learn how to deploy Falcon 2 11B on Amazon EC2 c7i instances for model Inference
Compute Blog
This article describes how to deploy the Falcon 2 11B large language model from Technology Innovation Institute (TII) on Amazon EC2 c7i instances using Intel Advanced Matrix Extensions (Intel AMX) for model inference.
Specifically, the article covers:
- Introducing Falcon 2 11B model and its availability on SageMaker JumpStart
- Benefits of INT8 and INT4 model quantization using OpenVINO for faster inference
- Benchmark results showing latency and throughput improvements with quantized models
- Step-by-step guide to quantize Falcon 2 11B using OpenVINO and run inference on c7i instances
- Conclusion highlighting the cost-effectiveness of deploying large models on CPUs using quantization
The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.
Related articles
May 31
2024
2024
Falcon 2 11B is now available on Amazon SageMaker JumpStart
Apr 14
2026
2026
Deploying Model Context Protocol (MCP) servers on Amazon ECS
Oct 2
2025
2025
Deploying AI models for inference with AWS Lambda using zip packaging
Mar 30
2026
2026
Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2
The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.