Home icon

Learn how to deploy Falcon 2 11B on Amazon EC2 c7i instances for model Inference

Compute Blog



This article describes how to deploy the Falcon 2 11B large language model from Technology Innovation Institute (TII) on Amazon EC2 c7i instances using Intel Advanced Matrix Extensions (Intel AMX) for model inference.

Specifically, the article covers:

  • Introducing Falcon 2 11B model and its availability on SageMaker JumpStart
  • Benefits of INT8 and INT4 model quantization using OpenVINO for faster inference
  • Benchmark results showing latency and throughput improvements with quantized models
  • Step-by-step guide to quantize Falcon 2 11B using OpenVINO and run inference on c7i instances
  • Conclusion highlighting the cost-effectiveness of deploying large models on CPUs using quantization


Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Related articles

May 31
2024
Falcon 2 11B is now available on Amazon SageMaker JumpStart
Apr 14
2026
Deploying Model Context Protocol (MCP) servers on Amazon ECS
Oct 2
2025
Deploying AI models for inference with AWS Lambda using zip packaging
Mar 30
2026
Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.