Whisper

Deploying vLLM with Audio and LLM Inference on ROSA with GPUs

Red Hat OpenShift Service on AWS (ROSA) provides a managed OpenShift environment that can leverage AWS GPU instances. This guide will walk you through deploying vLLM for both audio transcription (Whisper) and large language model inference on ROSA using GPU instances, along with a web application to interact with both services. Use case Automatically transcribe audio conversations (meetings, customer calls) and analyze content with an LLM to extract insights, decisions, and action items

Interested in contributing to these docs?

Collaboration drives progress. Help improve our documentation The Red Hat Way.

Products

Tools

Try, buy & sell

Communicate

About Red Hat

We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.