Transcription Setup

Configure local or cloud speech-to-text. Choose the right engine for your privacy, speed, and accuracy needs.

Overview

VoxInk supports multiple transcription modes so you can choose the right balance of privacy, speed, and accuracy for your workflow. You can switch between modes at any time from the Transcription submenu in the menu bar — no restart required.

There are three categories of transcription engine:

On This Mac — Whisper AI running locally on your Apple Silicon chip. Completely private, no internet needed.
Cloud providers — Deepgram or OpenAI Whisper. Fast and accurate, but audio is sent to external servers.
Custom server — Your own server — for IT-managed setups or company infrastructure.

On This Mac (Local Transcription)

Local transcription uses Whisper AI running entirely on your device. Your audio is processed by your Mac's Apple Silicon chip and never leaves your computer. This is the default mode and the best choice for anyone who values privacy.

How it works

You hold the hotkey and speak into your microphone.
When you release, the recorded audio is sent to the local Whisper engine.
Whisper processes the audio using your Mac's Neural Engine and GPU.
The transcribed text is pasted at your cursor — no network requests involved.

Model sizes

VoxInk offers different quality levels for local transcription. Higher quality uses more storage but gives more accurate results. The first time you select a quality level, VoxInk will download it automatically (this takes a minute or two).

Model	Size	Speed	Accuracy	Best For
`tiny`	75 MB	Fastest	Basic	Quick notes, testing, very short clips
`small`	466 MB	Fast	Good	General everyday dictation
`medium`	1.5 GB	Moderate	Better	Professional use, meetings
`large-v3-turbo`	1.6 GB	Moderate	Excellent	Recommended default — best balance of speed and accuracy
`large-v3`	3 GB	Slower	Best	Maximum accuracy, clinical documentation

The default model is large-v3-turbo, which provides excellent accuracy with reasonable processing speed on all Apple Silicon Macs. On M1 Pro/Max or newer chips, even the large-v3 model runs comfortably.

How to select local transcription

Click the VoxInk icon in your menu bar.
Hover over Transcription to open the submenu.
Select "On This Mac".
The first time you use a model, it will download automatically (this may take a minute depending on your connection speed).

Changing the Quality Level

To change which quality level VoxInk uses, click the VoxInk menu bar icon → Transcription → On This Mac → select your preferred model. If the option isn't shown in the menu, you can set it in your settings file at ~/Library/Application Support/VoxInk/config.json by changing the stt_local_model value.

Performance tips

First use is slower — VoxInk needs to prepare the AI engine the first time after you open it. After that, dictation is much faster.
Close heavy apps — If you're running video editors or other demanding programs, transcription may be slower.
Recording length — Local transcription handles recordings up to several minutes long. For very long recordings (5+ minutes), consider using a cloud provider for best results.

Cloud Providers

Cloud transcription sends your audio to an external service for processing. This can be faster than local transcription (especially on older M1 chips) and handles very long recordings well. The tradeoff is that your audio data leaves your device.

Deepgram

Deepgram is a cloud speech-to-text provider known for speed and accuracy. They offer a generous free tier that's perfect for getting started.

Create a Deepgram account
Go to console.deepgram.com and sign up for a free account.
Create an access code
In the Deepgram dashboard, go to API Keys and click to create a new one.
Select Deepgram in VoxInk
Click the VoxInk menu bar icon → Transcription → Deepgram.
Enter your access code
Paste your Deepgram access code into the prompt and click Save.
Start dictating
Your audio will now be processed by Deepgram's servers. Results are typically returned in under a second.

Tip

Deepgram offers $200 in free credit when you sign up — enough for hundreds of hours of transcription. No credit card required to get started.

OpenAI Whisper

OpenAI's Whisper API provides cloud-hosted Whisper transcription. It uses the same underlying model as the local "On This Mac" option, but runs on OpenAI's servers.

Create an OpenAI account
Go to platform.openai.com and sign up or log in.
Add billing
OpenAI's Whisper API requires an active billing account. Add a payment method in your Settings → Billing section.
Create an access code
Go to API Keys and create a new key. Copy it straight away — OpenAI only shows it once.
Select OpenAI in VoxInk
Click the VoxInk menu bar icon → Transcription → OpenAI Whisper.
Enter your access code
Paste your OpenAI access code into the prompt and click Save.

Note

OpenAI charges based on how much you use it. As of early 2026, it costs about $0.006 per minute of audio — so a typical 30-second dictation costs a fraction of a cent.

Custom Server (Other Server)

Connect VoxInk to your own transcription server. This is an advanced option typically set up by IT teams. It's useful for:

Self-hosted transcription servers
Company or institutional AI systems
Third-party transcription services
Local AI servers (e.g., the Compass Dental AI Server)

Setup

Select Other Server in VoxInk
Click the VoxInk menu bar icon → Transcription → Other Server.
Enter the server URL
Provide the full URL to the transcription endpoint. For example:
- http://your-server:8000/v1/audio/transcriptions
- https://ai.yourcompany.com/v1/stt/transcribe
- http://192.168.1.100:8080/v1/audio/transcriptions
Enter an API key (optional)
If your server requires authentication, enter the API key. Leave this field empty if your server doesn't use authentication.
Click Save
VoxInk will now send audio to your custom server for transcription.

Server requirements (for developers)

If you're setting up a custom server, it needs to:

Accept audio file uploads and return text
Use the OpenAI-compatible API format
Support common audio formats (WAV, MP3, M4A)

For full technical details, see the OpenAI Audio API reference.

Choosing the Right Mode

Not sure which transcription mode to use? Here's a quick decision guide based on what matters most to you:

Your Priority	Recommended Mode	Why
Privacy	On This Mac	Audio never leaves your device. No network requests, no external servers.
Speed	Cloud (Deepgram)	Deepgram returns results in under a second, even for longer recordings.
Accuracy	On This Mac (`large-v3`) or Cloud	The largest local model matches cloud accuracy. Cloud providers are also excellent.
No internet access	On This Mac	The only option that works completely offline.
Low disk space	Cloud (Deepgram or OpenAI)	No local model download required. The app footprint stays minimal.
Long recordings	Cloud (Deepgram)	Cloud providers handle very long audio without memory constraints.
Specialised vocabulary	On This Mac + Vocabulary	Combine local transcription with VoxInk's vocabulary biasing for domain-specific terms.

Tip

You can switch transcription modes at any time without restarting VoxInk. Try different options and see which works best for your workflow. Many users use local transcription for everyday dictation and switch to cloud for longer sessions or when they need maximum speed.

Troubleshooting

If transcription isn't working as expected, try these common fixes:

Local transcription is slow

The first transcription after launching VoxInk is always slower (model loading). Subsequent transcriptions should be faster.
Try a smaller model — switch from large-v3 to large-v3-turbo or medium.
Close GPU-intensive applications (video editors, games) to free up Metal GPU resources.

Cloud transcription fails

Verify your API key is correct and hasn't expired.
Check your internet connection.
For OpenAI: ensure you have an active billing account with available credit.
For Deepgram: check that your free credit hasn't been exhausted.

Custom server not working

Check the server address is correct and the server is running.
Ask your IT team to verify the server is responding.
Double-check the access code if one is required.

Transcription is inaccurate

Try a larger model (if using local transcription).
Ensure your microphone input level is adequate — speak at a normal volume and distance.
Reduce background noise or use a directional microphone.
Enable Vocabulary hints for domain-specific terms (dental, medical, legal, etc.).
Make sure the correct Language is selected in the menu bar.