Staff AI ML Engineer Large-Scale Systems
Prism ML
No recruiters. Founders reply directly.
No recruiters. Founders reply directly.
We build high-performance foundation models designed to run efficiently across a wide range of environments—from edge devices to large-scale deployments. Our work spans models from approximately 1B to 100B+ parameters across LLMs, diffusion models, and other modalities, with a strong focus on scalable training, efficient inference, and real-world deployment.
We are seeking a Staff-level (or higher) AI/ML engineer to lead large-scale model training efforts. This role combines hands-on ownership of large training runs with responsibility for setting technical direction, mentoring engineers, and improving model quality and system performance across the organization.
You will design, implement, and optimize distributed training systems for large-scale models across all major training phases. Core responsibilities include:
You bring deep experience in large-scale AI/ML systems and strong fundamentals in modern model training:
You have additional experience aligned with large-scale, high-performance AI/ML systems:
You have led or significantly contributed to training large models end-to-end, understand common failure modes in large-scale training systems, and know how to debug and improve them. You care about building efficient, reliable systems that work in real-world settings, enjoy mentoring others, and thrive at the intersection of research, engineering, and product.