Hamming AI

Automated voice AI agent testing and monitoring

Backend / Infra Engineer

$140K - $200K0.50% - 1.50%San Francisco, CA, US / London, England, GB / Austin, TX, US
Job type
Full-time
Role
Engineering, Backend
Experience
3+ years
Visa
US citizen/visa only
Skills
Python, Distributed Systems, Node.js
Connect directly with founders of the best YC-funded startups.
Apply to role ›
Sumanyu Sharma
Sumanyu Sharma
Co-Founder & CEO

About the role

Location: Remote (North America) or Austin, TX

Employment Type: Full-time (no contractors)

Department: Engineering

About Hamming AI

Hamming automates QA for voice AI agents. Everyone is building voice agents. We secure them. In fact, we invented this category. With one click, thousands of our agents call our customers’ agents across accents, background noise, and personalities—then we generate crisp bug reports and production-grade analytics. Reliability is the moat in voice AI, and that’s our whole job.

We are one of the fastest engineering teams in the world. We prod deploy 4x / day.

I’m looking for someone who can own reliability and scale across our LLM-enabled platform, shipping precise, outcome-driven improvements to high-availability systems.

— Sumanyu (CEO)
Previously: grew Citizen 4× and scaled an AI sales program to $100Ms/yr at Tesla.

Devin Case Study

Ranked #1 Eng team

OpenAI Dev Day 100billion token list

What you’ll do

  • Own core services in TypeScript/Node.js and Python that orchestrate LiveKit, Temporal, STT/TTS, and LLM tooling for real-time voice agents.
  • Scale 1 → N → 100×: take what works today and harden it for 10K parallel calls with 99.99% uptime. Turn human playbooks into productized systems.
  • Harden pipelines for ingestion, evaluation, and analytics so telephony events, recordings, and outcomes propagate reliably across services.
  • Level-up observability: deepen OpenTelemetry/SigNoz and trace-first practices to shrink mean-time-to-truth in prod.
  • Prototype → test → prod: partner with product to ship new LLM-driven behaviors with clear success metrics, guardrails, and regressions blocked in CI.
  • Infrastructure readiness: CI/CD, environment automation, incident response playbooks—customer conversations stay online.

You might be a fit if you

  • Have senior/staff experience running distributed backends with real-time/streaming constraints.
  • Are fluent in TypeScript/Node.js and comfortable jumping into Python for ML/audio jobs.
  • Know Temporal (or similar workflow engines), queues, Redis, and PostgreSQL.
  • Have shipped production LLM apps and understand prompt/tool design, evals, and guardrail instrumentation.
  • Operate cloud-native on AWS with Terraform; k8s doesn’t scare you.
  • Are a power user of Cursor/Zed/Devin and were using code-gen before it was cool.
  • Have intuition for what current-gen LLMs can/can’t do—and what tomorrow’s models will unlock.
  • Think independently, grind with customers, and do whatever it takes—without dropping the quality bar.
  • Bonus: built 0→1 real-time systems in Telecom/Networking, Autonomous Vehicles, or HFT; founded something; built AI voice apps.

Interesting problems you’ll touch

  • Voice simulations that feel real: accents, overlapping speech, crosstalk, background noise, barge-ins.
  • Massive concurrency: 10,000+ parallel calls with deterministic behavior and graceful degradation.
  • Temporal-driven orchestration for long-running, interruptible call flows.
  • Closed-loop reliability: turn prod failures into auto-generated tests and blocked deploys.
  • Trace-everything culture: make “what happened?” a 30-second question, not a war room.

How we work

  • Outcomes over output: we adjust roadmaps when new data lands.
  • Demo early and document decisions so context moves fast.
  • Own incidents: lead the investigation, write crisp notes, land durable fixes.
  • Direct, candid, respectful communication keeps remote teammates in lockstep with Austin HQ.

Our stack

  • App: Next.js, TypeScript, Tailwind
  • AI: OpenAI, Anthropic, STT/TTS providers
  • Realtime/Orchestration: LiveKit, Pipecat/Daily, Temporal
  • Infra/DB: AWS, k8s, PostgreSQL, Redis, Terraform
  • Observability: OpenTelemetry, SigNoz

Apply

If you want to make AI voice agents reliable at scale, let’s talk.

About Hamming AI

Hamming automates the QA of voice agents. (both pre-deployment testing and post-deployment analytics)

Making voice AI agents reliable is hard. A small change in prompts, function call definitions, or model providers can cause large changes in LLM outputs.

We have a proven track record of helping enterprises win with AI. Sumanyu (CEO) previously helped Citizen (safety app) grow its users by 4X and grew an AI-powered sales program to 100s of millions in revenue/year at Tesla.

Our eng team ranks #1 on https://workweave.ai/

Hamming AI
Founded:2024
Batch:S24
Team Size:8
Status:
Active
Founders
Sumanyu Sharma
Sumanyu Sharma
Co-Founder & CEO