We're Building the Voice Layer for Southeast Asia!
The Problem Is Bigger Than Transcription
600 million people in Southeast Asia speak in ways that existing voice AI systems fail to understand. Not because the technology is broken. Because it was never built for them. Singlish. Manglish. Bahasa campur. Code-switched Vietnamese. Thai with English inserts. These are not edge cases. They are the default. Every enterprise, contact centre, clinic, and logistics operator in the region runs on speech that today's models treat as noise. We're fixing that — at the infrastructure level.
The Scale of the Problem
- Million people speaking in ways no current AI understands
- Languages and dialects in our training and evaluation set
- Core markets where we're deployed and expanding
Open Roles — If You're the Right Fit, You'll Know
-
Speech / Applied ML Engineer (Intern)Singapore · Fully Remote
-
Solutions Engineer / Forward Deployed Engineer (Intern)Singapore · Fully Remote
-
Site Reliability Engineer (Intern)Singapore · Fully Remote
-
Security / Privacy Engineer (Intern)Singapore · Fully Remote
-
QA Engineer (Intern)Singapore · Fully Remote
How We Work
Ownership is assumed, not assigned. When something needs to exist, someone builds it. You don't wait for a spec. You don't wait to be asked. Speed is discipline — we ship to learn, iterate fast, and don't confuse motion with progress. We operate in ambiguity. Most of what we're doing hasn't been done before, especially at the intersection of low-resource languages, speech semantics, and real-time systems. Trust is default. We hire for character and give people room to work. If you need to be managed, this isn't the right place. We care about depth. We'd rather go deep on one hard problem than ship five shallow features. We build what should already exist — not because it's impressive, but because it matters.
Before VALSEA vs. After VALSEA
-
BEFORE → Generic ASR output: "Can you lah pass me the file or not ah I send you tru whatsapp" — raw text, no structure, no action. AFTER → VALSEA output: Intent: File transfer request | Action: Send via WhatsApp | Assignee: [Recipient] | Status: Pending | Priority: Normal. This is what speech-to-meaning looks like.
Real transcript sample
Singlish contact centre audio, Singapore
What You'll Actually Build
-
Code-switching ASR: Build models that handle mid-sentence language shifts (e.g. English → Bahasa → back to English) without losing semantic continuity
-
Semantic correction layer: Design a post-ASR correction pipeline that fixes dialect-specific mishearing, fills in dropped words, and resolves contextual ambiguity
-
Evaluation pipeline: Build automated evals for noisy, real-world speech — including WER on code-switched audio and intent-match scoring on structured outputs
-
Transcript → workflow engine: Design the structured output schema and rule-based + ML-based extraction system that converts conversational speech into CRM entries, task tickets, and meeting summaries
-
Low-latency streaming ASR: Optimize the real-time inference pipeline to achieve sub-300ms word-level outputs on consumer hardware across 5+ SEA language variants
-
Slang normalisation: Map regional colloquialisms, contractions, and informal expressions to their formal equivalents without destroying conversational tone or speaker intent
-
Data flywheel architecture: Design the feedback loop between production outputs, human correction, and model retraining — so the system gets better the more it's used
This Is NOT for Everyone
-
If you need a detailed brief before starting — you won't thrive here. We write specs to document decisions, not to initiate them.
We write specs to document decisions, not to initiate them.
-
If you optimise for optics over outcomes — wrong place.
No one here cares what your role title is. Only what you shipped.
-
If ambiguity makes you anxious — this isn't the role.
We're in early, hard territory. The map doesn't exist yet. You'd be drawing it.
-
If you're here to pad a resume — we'll both waste time. The problems we're solving are hard and consequential. They require people who actually care.
-
If you need hand-holding on technical decisions — this isn't for you. We expect you to go deeper on your domain than anyone else in the room.