Real-time voice translation that turns your phone into a personal interpreter.
Google Translate is a dictionary you carry. GoLingo is an interpreter that speaks for you.
A voice-first translation app for travelers. Tap a flag and talk — the app listens, translates with AI, and speaks the translation out loud, automatically. The other person taps their flag and responds.
No language flipping, no play buttons, no awkwardness. Just two people talking through a phone that gets out of the way. I designed and built it as a working proof of concept to explore one question: what if translation tools were designed for conversation instead of lookup?
Travelers already use Google Translate. It works. But in a real face-to-face interaction it's a multi-step process: open the app, tap the mic, speak, wait, read the screen, tap the speaker to play the audio, then flip the language so the other person can respond. Repeat for every exchange.
The friction isn't in translation quality — it's in the interaction design. Every extra tap, every flip, every "now press play" moment breaks conversational flow and reminds both people they're operating a tool instead of talking to each other.
The real cost: travelers skip conversations entirely. They point at menu items instead of asking what's good. They eat at the tourist restaurant with the English menu. They don't ask the local about the hidden beach — not because they can't translate, but because the process is just awkward enough to avoid.
The gap isn't translation quality. It's interaction design.
Make the output audio instead of text, and play it automatically instead of on a tap — and the whole social dynamic shifts. You look at the person instead of your phone. They hear a voice instead of squinting at a screen. The design challenge wasn't "how do we translate better." It was "how do we reduce a 6-step process to a 2-step one."
Explicit and unambiguous — "I'm about to speak this language."
The app detects when you're done (3-second silence), translates with AI, and speaks the result out loud — no extra taps. The other person taps their flag and responds the same way.
Each language has its own flag instead of one mic with a toggle. Tapping a flag means "I'll speak this." No confusion about which language is active, no accidental wrong-language recordings.
The translation plays automatically after processing. Any pause to find a "play" button breaks rhythm and makes both people wait while one operates the UI. Removing that tap turns "using a tool" into "having a conversation."
Continuous listening with a 3-second silence threshold allows natural mid-sentence pauses without cutting the speaker off, then auto-triggers translation when they finish. Manual stop stays available for noisy places.
The app knows your destination and dates, so it defaults to the right language pair — and could surface context-aware phrases (restaurant, emergencies, directions). It removes the setup step other apps demand at every interaction.
Translations accumulate as a scrollable, chat-style feed above the flags. Both speakers keep a visual reference; older messages fade as new ones arrive. It gives the interaction memory — something traditional tools lack.
Dark gradient with lime accents — deliberately distinct from utilitarian translation tools. The dark theme cuts visual distraction across varied lighting (bright markets, dim restaurants) and frames the app as a travel companion, not a utility.
Each exchange stacks into a chat-style feed — original text faded, translation in green — with a flag marking who spoke.
Both people can glance back at what's been said so far, so the phone holds the thread of the conversation instead of resetting after every line.
Not a replacement. Google Translate is an incredible tool — camera translation, 130 languages, offline mode. GoLingo focuses on the one case GT handles clumsily: real-time spoken conversation between two people.
Travelers aged 35–65 who already use Google Translate abroad. They don't want more features or more languages — they want less friction.
They want to order at the local place, ask the taxi driver to go somewhere specific, or chat with the hotel owner — without feeling like they're performing a tech demo. The key insight: they don't avoid translation because it doesn't work. They avoid it because the interaction overhead makes it socially uncomfortable.
A React PWA built with Vite. AI translation runs on Claude's API with a system prompt tuned for natural, idiomatic conversation — not word-for-word — handling formality registers, speech disfluencies, and language-specific nuance.
Speech recognition uses the Web Speech API in continuous mode; text-to-speech uses native synthesis with smart voice selection that prefers premium voices when available. Fully functional on desktop Chrome and Android.