Job Description, Responsibilities & Requirements
About the Position
Senior / Lead Machine Learning Engineer, Serving - Germany
Inworld is a research lab of top researchers and engineers, building the world’s top-ranked realtime voice models. Today our models are the #1 ranked realtime voice models in the world. They are used to power the largest consumer-facing AI applications available, across categories like health, fitness, learning, therapy, companions, customer experience, and media; representing 100s of millions of end users. Our work spans areas like research and development of state-of-the-art models, optimizing realtime inference, and creating best-in-class APIs and products that allow developers to engage their users.
Responsibilities
- Optimize realtime voice models
- Develop state-of-the-art AI applications
- Collaborate with cross-functional teams to ensure models are deployed effectively
Requirements
- Inference Optimization: Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.
- Model Acceleration: Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
- High-Performance Systems: Proficiency in C++, CUDA, Rust, or highly optimized Python. You know how to profile code and squeeze every ounce of performance out of NVIDIA GPUs.
- Distributed Systems & Scaling: Experience with Kubernetes, Ray, custom load balancing, multi-GPU/multi-node inference, and reliably handling thousands of concurrent connections.
- Public work: Non-trivial systems programming projects, open-source contributions to major inference engines, or deep-dive technical write-ups.
- Full-cycle ownership: You can take a model from the research team, containerize it, optimize its serving, and ensure it runs reliably in production.
- Background: PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.
- Professional fluency in English: Written and spoken fluency is required, as you will be collaborating daily with our US-based leadership and engineering teams.
Who Thrives Here
- Comfortable picking a direction and building the map as you go.
- Believes engineering isn't finished until it’s shipped and stable.
- Obsessed over the "why" and questions architectures to solve core latency or throughput problems.
- Thrives on deep context and wants to understand the fundamental logic behind every decision.
What Working Here Is Like
- We hand you unclear problems and expect you to make them clear.
- We value engineers who say "I don't know yet" and then design the benchmark or prototype that finds out.
- We treat performance, latency, and reliability as first-class product features.
- Impact comes before everything else, though we support sharing work and open-source contributions that move the field forward.
- Your work should be visible. Flat structure, fast iterations, minimal process theater.
We Offer
- Opportunity to work with the world’s top-ranked realtime voice models.
- Collaborate with a team of top researchers and engineers.
- Potential for full U.S. visa and relocation support for candidates interested in relocating to the San Francisco Bay Area in the future, subject to business needs and applicable legal and work authorization requirements.
About the Company
Inworld is a research lab of top researchers and engineers, building the world’s top-ranked realtime voice models. We’ve raised more than $125M from Lightspeed, Section 32, Kleiner Perkins, Microsoft’s M12 venture fund, Founders Fund, Meta, and Stanford, among others. Our technology has powered experiences from companies such as NVIDIA, Microsoft Xbox, Niantic, Logitech Streamlabs, Wishroll, Little Umbrella, and Bible Chat. We’ve also been recognized by CB Insights as one of the 100 most promising AI companies globally and have been named one of LinkedIn’s Top 10 Startups in the USA.