Staff / Principal Machine Learning Engineer, Serving - USA
Job Description, Responsibilities & Requirements
About the Position
Staff / Principal Machine Learning Engineer, Serving - USA
Mountain View, California, USA
We are seeking a Staff / Principal Machine Learning Engineer to join our team at Inworld, a leading research lab specializing in realtime voice models. Our models are the #1 ranked in the world and power some of the largest consumer-facing AI applications across various sectors.
Responsibilities
- Design and develop scalable backend services
- Optimize realtime inference and create best-in-class APIs and products
- Collaborate with cross-functional teams to engage users effectively
Requirements
- Inference Optimization: Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.
- Model Acceleration: Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
- High-Performance Systems: Proficiency in C++, CUDA, Rust, or highly optimized Python. Ability to profile code and maximize performance on NVIDIA GPUs.
- Distributed Systems & Scaling: Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and handling thousands of concurrent connections reliably.
- Public work: Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups.
- Full-cycle ownership: Ability to take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production.
- Background: PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.
Who Thrives Here
- Comfortable picking a direction and building the map as you go.
- Bias for impact over purely theoretical optimizations.
- Obsess over the "why" and question architectures for better solutions.
- Thrive on deep context and understanding the fundamental logic behind decisions.
What Working Here Is Like
- We hand you unclear problems and expect you to make them clear.
- We value engineers who say "I don't know yet" and then design the benchmark or prototype that finds out.
- Performance, latency, and reliability are first-class product features.
- Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward.
- Flat structure, fast iterations, minimal process theater.
We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our Mountain View office.
We Offer
- Base Salary: $270,000 - $500,000
- Bonus: Performance-based
- Equity: Stock options
- Benefits: Comprehensive benefits package
About the Company
Inworld is a research lab of top researchers and engineers, building the world’s top-ranked realtime voice models. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella, and Bible Chat. We’ve raised more than $125M from top-tier investors and have been recognized by CB Insights as one of the 100 most promising AI companies globally.
Apply now to join our innovative team and help shape the future of AI!