Speech Recognition Providers in Hedy
What are Speech Recognition Providers?
Hedy supports multiple speech recognition options, giving you flexibility to choose between complete privacy with local processing or cloud-based alternatives. You can switch providers anytime based on your current needs - use local for offline sessions and cloud services when you prefer their specific features.
Getting Started
-
Open the Hedy app
-
Navigate to Settings (tap your profile icon)
-
Scroll to “Speech Recognition Options”
-
Select your preferred provider from the dropdown menu
-
Configure provider-specific settings if needed
-
Your selection takes effect in the next recording session
Available Providers
Hedy offers four speech recognition options, each with unique characteristics:
-
Local Speech Recognition (Whisper): Default option - 100% private, works offline, no usage costs. Your audio never leaves your device. Available on every platform Hedy runs on.
-
Local Speech Recognition (Parakeet) [Beta]: A newer on-device engine that runs entirely on your device’s Neural Engine. Optimized for English and major European languages with faster, lower-latency transcripts. Available on Apple Silicon Macs and on iPhone 12 (or newer) and iPad Air 4 (or newer) running iOS 17 or later. Requires a one-time ~2.5 GB model download.
-
Deepgram: Cloud-based service with real-time streaming and smart formatting features. Uses Nova-3, which supports dozens of languages. Hedy exposes every language Nova-3 offers, so you can transcribe meetings in any supported language without switching providers. Requires your own API key.
-
OpenAI: Cloud transcription with Voice Activity Detection and automatic language detection. Hedy automatically continues long sessions past OpenAI’s 60-minute per-connection cap by rotating connections behind the scenes, so hour-plus meetings keep going without interruption. Requires your own API key.
Configuring Local Speech Recognition (Whisper)
When using Whisper, you can optimize for your device and needs:
For macOS Users:
-
Small Model: Fastest processing, recommended for Intel Macs
-
Regular Model: Balanced speed and accuracy for most users
-
Large Model: Enhanced capabilities for non-English languages (requires 1.5GB download)
For iOS/Android Users:
-
Standard Model: Default option suitable for most devices
-
Large Model: Alternative model option (iPhone 12+ or 2024+ Android recommended)
Voice Activity Detection (VAD):
VAD automatically filters out silence and background noise to improve transcription quality. This feature is enabled by default for Whisper.
-
Enable/Disable: Toggle VAD on or off based on your recording environment
-
Sensitivity: Adjust from “High Sensitivity” (captures more speech, including quieter sounds) to “Maximum Filtering” (only captures clear speech, filters more background noise)
Transcript Speed Settings:
-
Slower: Waits for complete sentences before displaying
-
Normal: Balanced speed and display timing
-
Faster: Near real-time display with more frequent updates
Configuring Local Speech Recognition (Parakeet)
Parakeet is currently in Beta. It transcribes entirely on-device using your iPhone, iPad, or Mac’s Neural Engine, and aims to give you a faster, lower-latency transcript than Whisper for supported languages.
Device requirements:
-
Apple Silicon Mac (M1 or newer), or
-
iPhone 12 family or newer, or iPad Air 4 or newer, running iOS 17 or later
First-time setup:
-
Select Local Speech Recognition (Parakeet) from the provider dropdown
-
Tap Download Parakeet Model (~2.5 GB) - we recommend Wi-Fi
-
Once the download finishes, Parakeet is used automatically in your next session
Language support:
Parakeet works best for English and the major European languages. It may occasionally misidentify similar languages. If transcripts come out in the wrong language, switch back to Whisper for that session.
Automatic fallback:
If Parakeet cannot start a session on your device (for example, after an OS update changes the on-device model format), Hedy automatically falls back to Whisper for that session and offers a one-tap prompt to download the new Parakeet model from Settings. You won’t lose the session.
Setting Up Cloud Providers
Deepgram Setup:
-
Create an account at console.deepgram.com
-
Generate an API key from your dashboard
-
In Hedy Settings, select Deepgram from the dropdown
-
Paste your API key and tap “Test” to verify
-
Choose your model and language preferences
-
Set maximum session duration to control costs
OpenAI Setup:
-
Get your API key from platform.openai.com/api-keys
-
In Hedy Settings, select OpenAI from the dropdown
-
Enter your API key and test the connection
-
Choose your preferred model
-
Optionally enable Voice Activity Detection with adjustable sensitivity
-
Set maximum session duration for cost control
Choosing the Right Provider
Select based on your priorities and use case:
-
Privacy First: Use either local engine (Whisper or Parakeet) - audio never leaves your device
-
Offline Use: Both local engines work without internet
-
Cloud Features: Deepgram and OpenAI offer cloud-based processing
-
Voice Detection: Whisper and OpenAI include Voice Activity Detection features
-
Smart Formatting: Deepgram offers automatic formatting options
-
No Usage Costs: Local engines (Whisper, Parakeet) have no per-minute charges
-
Faster On-Device Transcription: On supported Apple Silicon Macs, iPhones, and iPads, Parakeet (Beta) typically delivers a lower-latency transcript than Whisper for English and major European languages
-
Maximum Language Coverage On-Device: For non-European languages on-device, prefer Whisper Large
-
Fully Private Analysis: On macOS (Apple Silicon) or Windows, you can pair local speech recognition with Local AI Processing to keep both transcription and AI analysis fully on-device.
Cost Considerations
Understanding the cost implications of each provider:
-
Local Speech Recognition (Whisper): Free - no usage charges
-
Local Speech Recognition (Parakeet): Free - no usage charges (one-time ~2.5 GB model download)
-
Deepgram: Pay-per-minute pricing (check current rates on their dashboard)
-
OpenAI: Usage-based pricing (check current rates on their platform)
The maximum session duration setting helps prevent accidental overnight recordings and manage API costs.
Best Practices
-
Start with Local Speech Recognition (Whisper) to familiarize yourself with the feature, then try Parakeet if your device is supported
-
Test cloud providers with short recordings before important sessions
-
Monitor your API usage on provider dashboards to track costs
-
Use different providers for different scenarios based on your needs
-
Switch to local when traveling or in areas with limited internet
-
Set appropriate maximum session durations (60-120 minutes for typical meetings)
Troubleshooting
API Key Not Working
-
Ensure you copied the complete key without spaces
-
Verify your account has available credits
-
Check the API key has necessary permissions
-
Try regenerating the key from provider dashboard
Connection Test Failed
-
Check your internet connection stability
-
Verify firewall isn’t blocking WebSocket connections
-
Ensure API key is active with sufficient quota
-
Wait a moment and try again (temporary service issues)
Transcription Issues
-
For Whisper: Try a different model size
-
For Parakeet: If transcripts come out in the wrong language for a multilingual session, switch to Whisper for that session
-
For Cloud: Check internet connection stability
-
Ensure microphone is properly configured
-
Minimize background noise during recording
Settings Not Saving
-
Wait for the “Saved” indicator to appear
-
Don’t switch screens while saving
-
Restart the app if issues persist
-
Ensure you have a stable internet connection
Your API keys are stored securely in your device’s encrypted keychain and never transmitted to Hedy’s servers. For maximum privacy with sensitive conversations, always use a local engine (Whisper or Parakeet).