STT is a tiny tray app that turns push-to-talk into text. Hold Right Alt, speak, release — the transcript is pasted into whatever window you're in. 100% local, powered by faster-whisper. No cloud, no API keys.
Built for people who care where their audio goes. STT runs the whole transcription pipeline locally using Whisper-family models — no servers, no telemetry, no API keys.
Audio is captured, processed, and transcribed on-device by faster-whisper. Nothing is uploaded, logged, or phoned home. It's a single Python script — read it top to bottom.
No internet after the one-time model download. Drop into airplane mode and STT keeps transcribing — nothing runs behind a login.
MIT licensed. No "free tier." No credit card. No model weights paywalled behind a signup.
Same family of models as the cloud tools. Switch between tiny, base and small straight from the tray menu — CPU (int8) by default, CUDA if you want it.
Pick your size. The default downloads ~150 MB on first run, caches into your HuggingFace folder — then it's yours, permanently.
STT is a single Python script. Read it, star it, fork it, open an issue. Stats below are pulled live from the GitHub API when the page loads.
Loading repo…
git clone https://github.com/NYOGamesCOM/STT
PS> git clone https://github.com/NYOGamesCOM/STT PS> cd STT PS> pip install -r requirements.txt PS> python stt.py # Admin required for global hotkeys
$ git clone https://github.com/NYOGamesCOM/STT $ cd STT $ pip install -r requirements.txt $ python3 stt.py # macOS: grant Accessibility perm
Pre-built binaries from our GitHub Releases — no Python install required. Each download always points to the latest tagged release.
Right-click → Run as administrator (global hotkeys need elevated privileges).
Unzip, then grant Accessibility in System Settings → Privacy & Security.
chmod +x STT-linux-x64 && sudo ./STT-linux-x64
The paid apps are great at what they do — we just think the core feature (turning your voice into text) shouldn't cost a subscription, require cloud round-trips, or hoard your audio.