This post provides a comprehensive guide on hosting and deploying the OpenAI Whisper speech recognition model on Amazon SageMaker, exploring different implementation frameworks like PyTorch and Hugging Face, as well as various inference options such as real-time, batch, and asynchronous inference.

<div>
<p>This article explains how to deploy the open-source Whisper automatic speech recognition (ASR) model on Amazon SageMaker using PyTorch and Hugging Face frameworks. It also explores different inference options available on SageMaker, including real-time inference, batch transform jobs, and asynchronous inference.</p>
<p>Specifically, the article covers:</p>
<ul>
<li>Saving model artifacts for PyTorch and Hugging Face implementations of Whisper</li>
<li>Selecting appropriate deep learning containers (DLCs) for each framework</li>
<li>Creating SageMaker models for PyTorch and Hugging Face Whisper</li>
<li>Defining custom inference scripts for loading the models</li>
<li>Deploying the models using real-time inference, batch transform jobs, and asynchronous inference</li>
<li>Comparing the inference options in terms of speed, payload size, scalability, and cost</li>
<li>Providing guidance on choosing the appropriate inference option based on use case requirements</li>
</ul>
</div>


Related articles