Senior Software Engineer
Job Description
Responsibilities
- Lead the design, development, and operation of a next generation AI workload orchestration platform on Kubernetes, enabling scalable job scheduling, multi node distributed training, and artifact management across Inferentia and Trainium accelerators.
- Drive architectural decisions across the full stack—from service APIs and workflow orchestration to cluster level scheduling and resource optimization—targeting high availability and efficient hardware utilization.
- Collaborate with ML researchers, SDK developers, and infrastructure engineers within Annapurna Labs and partner organizations to keep the platform aligned with current and upcoming AI accelerators.
- Mentor and technically guide a team of software engineers, setting standards for operational excellence, system design, and developer experience.
- Architect and implement scalable services spanning Kubernetes operators, workflow orchestration, and scheduling algorithms to maximize hardware utilization across a growing accelerator fleet.
- Work daily with ML researchers, SDK developers, and hardware engineers to translate silicon capabilities into platform features.
- Own the full lifecycle of systems from design and implementation through deployment, monitoring, and incident response.
- Improve developer velocity by refining CLI tools, APIs, and framework integrations to reduce friction between coding and running at scale.
- Participate in design reviews, code reviews, and operational readiness discussions with a small, high ownership team delivering quickly with immediate customer impact.
- Operate in a fast paced, startup like environment where priorities are driven by the next generation of AI hardware on the roadmap.
Requirements
- 5+ years of non internship professional software development experience
- 5+ years of programming in at least one software programming language
- 5+ years of leading design or architecture of new and existing systems, including design patterns, reliability, and scaling
- 5+ years of full software development life cycle experience, covering coding standards, code reviews, source control management, build processes, testing, and operations
- Experience as a mentor, tech lead, or leading an engineering team
Technologies
- Kubernetes
Benefits
- Health insurance including medical, dental, vision, prescription drugs, Basic Life & AD&D, with options for supplemental life plans, EAP, mental health support, medical advice line, flexible spending accounts, and adoption or surrogacy reimbursement
- 401(k) matching
- Paid time off
- Parental leave
- Sign-on payments and restricted stock units (RSUs)
A Day in the Life
- Architect and implement scalable services spanning Kubernetes operators, workflow orchestration, and scheduling algorithms to maximize hardware utilization across accelerators.
- Collaborate daily with ML researchers, SDK developers, and hardware engineers to translate silicon capabilities into platform features.
- Own the full lifecycle of your systems from design and implementation through deployment, monitoring, and incident response.
- Improve developer velocity by refining CLI tools, APIs, and framework integrations that reduce friction between writing code and running at scale.
- Participate in design and code reviews, and operational readiness discussions with a small, high ownership team delivering quickly with immediate customer impact.
- Work in a fast paced, startup like environment where priorities are driven by the next generation of AI hardware on the roadmap.
About the Team
Our team welcomes new members and blends a range of experience levels. We foster knowledge sharing and mentorship, with senior members providing one on one guidance and thoughtful code reviews. We prioritize career growth and strive to assign projects that help engineers deepen expertise and tackle increasingly complex tasks.
Diverse Experiences
We value diverse backgrounds. If you do not meet every qualification, we encourage you to apply. People early in their careers or with nontraditional paths are welcome to consider this opportunity.
About AWS
Amazon Web Services (AWS) is the broadest and most deeply integrated cloud platform, trusted by startups and Global 500 companies alike to power their businesses with a comprehensive suite of products and services.
Inclusive Team Culture
We foster a culture of learning and curiosity. Our employee led affinity groups promote inclusion, and ongoing events and learning experiences encourage you to embrace and celebrate differences.
Work / Life Balance
We value flexibility and aim to support work life harmony so success at work does not come at the expense of home life.
Mentorship & Career Growth
We continually raise the performance bar and provide mentorship and knowledge sharing to help you develop into a well rounded professional capable of taking on more advanced challenges.