EngineeringWhisperSpeech-to-TextMLXApple SiliconTranscription

Run Whisper on Apple Silicon in the Cloud: Fast Speech-to-Text with MLX

March 2, 20266 min readby Macyou Team

Whisper is OpenAI's open-source speech recognition model. It handles transcription and translation across 99 languages with remarkable accuracy — even on noisy audio, accented speech, and technical jargon. The model comes in several sizes (tiny, base, small, medium, large), trading speed for accuracy. On Apple Silicon, Whisper gets a significant boost from MLX, Apple's ML framework that runs natively on the Metal GPU and Neural Engine.

MLX Whisper: 4x Faster Than CPU

The mlx-whisperpackage is a port of Whisper optimized for Apple Silicon. It uses MLX's lazy evaluation and unified memory to run inference on the Metal GPU without the overhead of PyTorch or CoreML conversion. On an M4 Pro, the large-v3 model transcribes audio at roughly 4x real-time speed — a 60-minute podcast takes about 15 minutes to transcribe. The small model runs at nearly 20x real-time, making it practical for live or near-live transcription.

Deploying Whisper on Macyou

The Macyou Catalog includes a Whisper stack pre-configured with Python, MLX, and mlx-whisper. Deploy it on an M4 Pro server and start transcribing immediately.

$ ssh root@YOUR_IP
$ mlx_whisper audio.mp3 --model large-v3 --output-format srt
Transcribing... 4.2x realtime on M4 Pro GPU

$ ls output/
audio.srt  audio.txt  audio.json

Batch Transcription Pipeline

For production use, set up a batch pipeline: drop audio files into a watched directory, and a simple script transcribes each one and outputs SRT, TXT, and JSON files. Combine this with a web frontend or API wrapper and you have a private, self-hosted transcription service. No audio data leaves your server — important for legal recordings, medical dictation, and confidential meetings.

Recommended Tier

Whisper large-v3 needs about 6 GB of memory, so even the Starter tier (from $99/mo, 16–32 GB) handles it comfortably. For concurrent transcription jobs or running alongside other services, a 32 GB build or an M4 Pro is better. See pricing.

Start transcribing — deploy Whisper on dedicated Apple Silicon now.

All posts