Transcription Setup

Configure local or cloud speech-to-text. Choose the right engine for your privacy, speed, and accuracy needs.

Overview

VoxInk supports multiple transcription modes so you can choose the right balance of privacy, speed, and accuracy for your workflow. You can switch between modes at any time from the Transcription submenu in the menu bar — no restart required.

There are three categories of transcription engine:

  • On This Mac — Whisper AI running locally on your Apple Silicon chip. Completely private, no internet needed.
  • Cloud providers — Deepgram or OpenAI Whisper. Fast and accurate, but audio is sent to external servers.
  • Custom server — Your own server — for IT-managed setups or company infrastructure.

On This Mac (Local Transcription)

Local transcription uses Whisper AI running entirely on your device. Your audio is processed by your Mac's Apple Silicon chip and never leaves your computer. This is the default mode and the best choice for anyone who values privacy.

How it works

  1. You hold the hotkey and speak into your microphone.
  2. When you release, the recorded audio is sent to the local Whisper engine.
  3. Whisper processes the audio using your Mac's Neural Engine and GPU.
  4. The transcribed text is pasted at your cursor — no network requests involved.

Model sizes

VoxInk offers different quality levels for local transcription. Higher quality uses more storage but gives more accurate results. The first time you select a quality level, VoxInk will download it automatically (this takes a minute or two).

Model Size Speed Accuracy Best For
tiny 75 MB Fastest Basic Quick notes, testing, very short clips
small 466 MB Fast Good General everyday dictation
medium 1.5 GB Moderate Better Professional use, meetings
large-v3-turbo 1.6 GB Moderate Excellent Recommended default — best balance of speed and accuracy
large-v3 3 GB Slower Best Maximum accuracy, clinical documentation

The default model is large-v3-turbo, which provides excellent accuracy with reasonable processing speed on all Apple Silicon Macs. On M1 Pro/Max or newer chips, even the large-v3 model runs comfortably.

How to select local transcription

  1. Click the VoxInk icon in your menu bar.
  2. Hover over Transcription to open the submenu.
  3. Select "On This Mac".
  4. The first time you use a model, it will download automatically (this may take a minute depending on your connection speed).
Changing the Quality Level
To change which quality level VoxInk uses, click the VoxInk menu bar icon → TranscriptionOn This Mac → select your preferred model. If the option isn't shown in the menu, you can set it in your settings file at ~/Library/Application Support/VoxInk/config.json by changing the stt_local_model value.

Performance tips

  • First use is slower — VoxInk needs to prepare the AI engine the first time after you open it. After that, dictation is much faster.
  • Close heavy apps — If you're running video editors or other demanding programs, transcription may be slower.
  • Recording length — Local transcription handles recordings up to several minutes long. For very long recordings (5+ minutes), consider using a cloud provider for best results.

Cloud Providers

Cloud transcription sends your audio to an external service for processing. This can be faster than local transcription (especially on older M1 chips) and handles very long recordings well. The tradeoff is that your audio data leaves your device.

Deepgram

Deepgram is a cloud speech-to-text provider known for speed and accuracy. They offer a generous free tier that's perfect for getting started.

  1. Create a Deepgram account

    Go to console.deepgram.com and sign up for a free account.

  2. Create an access code

    In the Deepgram dashboard, go to API Keys and click to create a new one.

  3. Select Deepgram in VoxInk

    Click the VoxInk menu bar icon → TranscriptionDeepgram.

  4. Enter your access code

    Paste your Deepgram access code into the prompt and click Save.

  5. Start dictating

    Your audio will now be processed by Deepgram's servers. Results are typically returned in under a second.

Tip
Deepgram offers $200 in free credit when you sign up — enough for hundreds of hours of transcription. No credit card required to get started.

OpenAI Whisper

OpenAI's Whisper API provides cloud-hosted Whisper transcription. It uses the same underlying model as the local "On This Mac" option, but runs on OpenAI's servers.

  1. Create an OpenAI account

    Go to platform.openai.com and sign up or log in.

  2. Add billing

    OpenAI's Whisper API requires an active billing account. Add a payment method in your Settings → Billing section.

  3. Create an access code

    Go to API Keys and create a new key. Copy it straight away — OpenAI only shows it once.

  4. Select OpenAI in VoxInk

    Click the VoxInk menu bar icon → TranscriptionOpenAI Whisper.

  5. Enter your access code

    Paste your OpenAI access code into the prompt and click Save.

Note
OpenAI charges based on how much you use it. As of early 2026, it costs about $0.006 per minute of audio — so a typical 30-second dictation costs a fraction of a cent.

Custom Server (Other Server)

Connect VoxInk to your own transcription server. This is an advanced option typically set up by IT teams. It's useful for:

  • Self-hosted transcription servers
  • Company or institutional AI systems
  • Third-party transcription services
  • Local AI servers (e.g., the Compass Dental AI Server)

Setup

  1. Select Other Server in VoxInk

    Click the VoxInk menu bar icon → TranscriptionOther Server.

  2. Enter the server URL

    Provide the full URL to the transcription endpoint. For example:

    • http://your-server:8000/v1/audio/transcriptions
    • https://ai.yourcompany.com/v1/stt/transcribe
    • http://192.168.1.100:8080/v1/audio/transcriptions
  3. Enter an API key (optional)

    If your server requires authentication, enter the API key. Leave this field empty if your server doesn't use authentication.

  4. Click Save

    VoxInk will now send audio to your custom server for transcription.

Server requirements (for developers)

If you're setting up a custom server, it needs to:

  • Accept audio file uploads and return text
  • Use the OpenAI-compatible API format
  • Support common audio formats (WAV, MP3, M4A)

For full technical details, see the OpenAI Audio API reference.

Choosing the Right Mode

Not sure which transcription mode to use? Here's a quick decision guide based on what matters most to you:

Your Priority Recommended Mode Why
Privacy On This Mac Audio never leaves your device. No network requests, no external servers.
Speed Cloud (Deepgram) Deepgram returns results in under a second, even for longer recordings.
Accuracy On This Mac (large-v3) or Cloud The largest local model matches cloud accuracy. Cloud providers are also excellent.
No internet access On This Mac The only option that works completely offline.
Low disk space Cloud (Deepgram or OpenAI) No local model download required. The app footprint stays minimal.
Long recordings Cloud (Deepgram) Cloud providers handle very long audio without memory constraints.
Specialised vocabulary On This Mac + Vocabulary Combine local transcription with VoxInk's vocabulary biasing for domain-specific terms.
Tip
You can switch transcription modes at any time without restarting VoxInk. Try different options and see which works best for your workflow. Many users use local transcription for everyday dictation and switch to cloud for longer sessions or when they need maximum speed.

Troubleshooting

If transcription isn't working as expected, try these common fixes:

Local transcription is slow

  • The first transcription after launching VoxInk is always slower (model loading). Subsequent transcriptions should be faster.
  • Try a smaller model — switch from large-v3 to large-v3-turbo or medium.
  • Close GPU-intensive applications (video editors, games) to free up Metal GPU resources.

Cloud transcription fails

  • Verify your API key is correct and hasn't expired.
  • Check your internet connection.
  • For OpenAI: ensure you have an active billing account with available credit.
  • For Deepgram: check that your free credit hasn't been exhausted.

Custom server not working

  • Check the server address is correct and the server is running.
  • Ask your IT team to verify the server is responding.
  • Double-check the access code if one is required.

Transcription is inaccurate

  • Try a larger model (if using local transcription).
  • Ensure your microphone input level is adequate — speak at a normal volume and distance.
  • Reduce background noise or use a directional microphone.
  • Enable Vocabulary hints for domain-specific terms (dental, medical, legal, etc.).
  • Make sure the correct Language is selected in the menu bar.