Learn how to deploy Falcon 2 11B on Amazon EC2 c7i instances for model Inference

Compute Blog

This article describes how to deploy the Falcon 2 11B large language model from Technology Innovation Institute (TII) on Amazon EC2 c7i instances using Intel Advanced Matrix Extensions (Intel AMX) for model inference.

Specifically, the article covers:

Introducing Falcon 2 11B model and its availability on SageMaker JumpStart
Benefits of INT8 and INT4 model quantization using OpenVINO for faster inference
Benchmark results showing latency and throughput improvements with quantized models
Step-by-step guide to quantize Falcon 2 11B using OpenVINO and run inference on c7i instances
Conclusion highlighting the cost-effectiveness of deploying large models on CPUs using quantization

Go to article

The AWS News Feed is currently looking for gold sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

May 31
2024

Falcon 2 11B is now available on Amazon SageMaker JumpStart

Apr 14
2026

Deploying Model Context Protocol (MCP) servers on Amazon ECS

Oct 2
2025

Deploying AI models for inference with AWS Lambda using zip packaging

Mar 30
2026

Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2

The AWS News Feed is currently looking for silver sponsors. If you want to support the AWS community and reach a large audience of AWS professionals, consider sponsoring the AWS News Feed.

Learn how to deploy Falcon 2 11B on Amazon EC2 c7i instances for model Inference

Related articles