TASSOS GKOUVAS — SENIOR PRODUCT DESIGNER back to home

Side project · Voice translation

GoLingo

Real-time voice translation that turns your phone into a personal interpreter.

Google Translate is a dictionary you carry. GoLingo is an interpreter that speaks for you.

2026 · Working PoC Concept → prototype in one sprint React · Claude API · Web Speech

GoLingo

Listening

Speak in Spanish

Tap to stop

Talking in English

→ good morning, I'm looking for a café nearby

Tap to interrupt

🇪🇸 start talking in
Spanish

🇬🇧 start talking in
English

scroll to explore

GoLingo concept screens — home, language select, listening

Concept screens — designed in Figma, then built as a working app.

A voice-first translation app for travelers. Tap a flag and talk — the app listens, translates with AI, and speaks the translation out loud, automatically. The other person taps their flag and responds.

No language flipping, no play buttons, no awkwardness. Just two people talking through a phone that gets out of the way. I designed and built it as a working proof of concept to explore one question: what if translation tools were designed for conversation instead of lookup?

the problem • the insight • the problem • the problem • the insight • the problem •

The problem

Travelers already use Google Translate. It works. But in a real face-to-face interaction it's a multi-step process: open the app, tap the mic, speak, wait, read the screen, tap the speaker to play the audio, then flip the language so the other person can respond. Repeat for every exchange.

The friction isn't in translation quality — it's in the interaction design. Every extra tap, every flip, every "now press play" moment breaks conversational flow and reminds both people they're operating a tool instead of talking to each other.

The real cost: travelers skip conversations entirely. They point at menu items instead of asking what's good. They eat at the tourist restaurant with the English menu. They don't ask the local about the hidden beach — not because they can't translate, but because the process is just awkward enough to avoid.

The insight

The gap isn't translation quality. It's interaction design.

Make the output audio instead of text, and play it automatically instead of on a tap — and the whole social dynamic shifts. You look at the person instead of your phone. They hear a voice instead of squinting at a screen. The design challenge wasn't "how do we translate better." It was "how do we reduce a 6-step process to a 2-step one."

The solution — two steps

Tap your language flag

Explicit and unambiguous — "I'm about to speak this language."

Speak

The app detects when you're done (3-second silence), translates with AI, and speaks the result out loud — no extra taps. The other person taps their flag and responds the same way.

GoLingo talking and language-select states

product thinking • product thinking • product thinking • product thinking •

The "why" behind each decision

Flag-tap-to-listen

Each language has its own flag instead of one mic with a toggle. Tapping a flag means "I'll speak this." No confusion about which language is active, no accidental wrong-language recordings.

Auto-playback · the key decision

The translation plays automatically after processing. Any pause to find a "play" button breaks rhythm and makes both people wait while one operates the UI. Removing that tap turns "using a tool" into "having a conversation."

Silence detection

Continuous listening with a 3-second silence threshold allows natural mid-sentence pauses without cutting the speaker off, then auto-triggers translation when they finish. Manual stop stays available for noisy places.

Trip context

The app knows your destination and dates, so it defaults to the right language pair — and could surface context-aware phrases (restaurant, emergencies, directions). It removes the setup step other apps demand at every interaction.

Conversation feed

Translations accumulate as a scrollable, chat-style feed above the flags. Both speakers keep a visual reference; older messages fade as new ones arrive. It gives the interaction memory — something traditional tools lack.

Dark, premium aesthetic

Dark gradient with lime accents — deliberately distinct from utilitarian translation tools. The dark theme cuts visual distraction across varied lighting (bright markets, dim restaurants) and frames the app as a travel companion, not a utility.

GoLingo conversation feed with original and translated messages

The conversation, remembered

Each exchange stacks into a chat-style feed — original text faded, translation in green — with a flag marking who spoke.

Both people can glance back at what's been said so far, so the phone holds the thread of the conversation instead of resetting after every line.

GoLingo vs Google Translate

Not a replacement. Google Translate is an incredible tool — camera translation, 130 languages, offline mode. GoLingo focuses on the one case GT handles clumsily: real-time spoken conversation between two people.

Aspect

Google Translate

GoLingo

Primary output

Text on screen

Spoken audio

Audio playback

Manual tap required

Automatic

Language switching

Flip / toggle required

Tap the other flag

Steps per exchange

4–6 taps

1 tap + speak

Designed for

Translation lookup

Live conversation

Trip awareness

None

Destination, dates, language defaults

Social dynamic

Both people stare at the phone

Both people look at each other

Who this is for

Travelers aged 35–65 who already use Google Translate abroad. They don't want more features or more languages — they want less friction.

They want to order at the local place, ask the taxi driver to go somewhere specific, or chat with the hotel owner — without feeling like they're performing a tech demo. The key insight: they don't avoid translation because it doesn't work. They avoid it because the interaction overhead makes it socially uncomfortable.

Under the hood

A React PWA built with Vite. AI translation runs on Claude's API with a system prompt tuned for natural, idiomatic conversation — not word-for-word — handling formality registers, speech disfluencies, and language-specific nuance.

Speech recognition uses the Web Speech API in continuous mode; text-to-speech uses native synthesis with smart voice selection that prefers premium voices when available. Fully functional on desktop Chrome and Android.

Where it goes next

→Whisper + ElevenLabs for near-human voice quality and full iOS support.
→Smart phrase suggestions from trip context and conversation topic.
→Offline capability for poor-connectivity areas.
→User testing with real travelers in real situations.
→Conversation bookmarking — save useful phrases for quick replay.