Senior Machine Learning Engineer - Scene Understanding
Company: Zoox
Location: Boston
Posted on: April 2, 2026
|
|
|
Job Description:
The Perception team at Zoox creates the "eyes and ears" of our
self-driving robots. Navigating safely and efficiently in complex
environments requires detecting, classifying, tracking, and
understanding various attributes of surrounding objects—all in
real-time and with exceptional accuracy. As an engineer in the
Scene Understanding team, you will develop advanced
Vision-Language-Action (VLA) models that perceive our vehicle's
surroundings to identify hazards and make driving suggestions. You
will utilize VLA models for detecting rare events and ensuring safe
driving in these situations. You'll work with state-of-the-art
machine learning models that operate in real-time on our robotaxi
platform with minimal latency. Collaborating with world-class
engineers and researchers across sensors, planning, and other
teams, you'll have access to premium sensor data and cutting-edge
infrastructure to validate your algorithms in real-world
conditions. In this role, you will Design and train
Vision-Language-Action (VLA) solutions for robotaxis Lead
end-to-end data strategy, including mining, auto-labeling, and
dataset construction to power our ML flywheel Lead the full
post-training stack for VLMs and VLAs, including C ontinual
Pre-training (CPT) on domain-specific driving data, Supervised
Fine-Tuning (SFT) for instruction following. Utilize our
large-scale data pipelines and ML infrastructure to research,
prototype, and deploy solutions that improve driving behavior
Partner with cross-functional teams to integrate perception signals
Qualifications MS or PhD in Computer Science or related field
Background in deep learning solutions for VLM and VLA models Track
record in post-training large-scale models, CPT, SFT, RL Hands-on
experience with production ML pipelines, including dataset
creation, training frameworks, and metrics Expertise in Python
libraries (PyTorch, NumPy, Pandas, VLLM) Bonus Qualifications Deep
knowledge of cutting-edge computer vision techniques Publications
in top-tier conferences (CVPR, ICCV, RSS, ICRA) Experience with
integrating large language models to various tasks. $189,000 -
$290,000 a year Base Salary Range There are three major components
to compensation for this position: salary, Amazon Restricted Stock
Units (RSUs), and Zoox Stock Appreciation Rights. A sign-on bonus
may be offered as part of the compensation package. The listed
range applies only to the base salary. Compensation will vary based
on geographic location and level. Leveling, as well as positioning
within a level, is determined by a range of factors, including, but
not limited to, a candidate's relevant years of experience, domain
knowledge, and interview performance. The salary range listed in
this posting is representative of the range of levels Zoox is
considering for this position. Zoox also offers a comprehensive
package of benefits, including paid time off (e.g. sick leave,
vacation, bereavement), unpaid time off, Zoox Stock Appreciation
Rights, Amazon RSUs, health insurance, long-term care insurance,
long-term and short-term disability insurance, and life insurance.
About Zoox Zoox is developing the first ground-up, fully autonomous
vehicle fleet and the supporting ecosystem required to bring this
technology to market. Sitting at the intersection of robotics,
machine learning, and design, Zoox aims to provide the next
generation of mobility-as-a-service in urban environments. We’re
looking for top talent that shares our passion and wants to be part
of a fast-moving and highly execution-oriented team. Follow us on
LinkedIn Accommodations If you need an accommodation to participate
in the application or interview process please reach out to [email
protected] or your assigned recruiter. A Final Note: We may use
artificial intelligence (AI) tools to support parts of the hiring
process, such as reviewing applications, analyzing resumes, or
assessing responses. These tools assist our recruitment team but do
not replace human judgment. Final hiring decisions are ultimately
made by humans. If you would like more information about how your
data is processed, please contact us.
Keywords: Zoox, Concord , Senior Machine Learning Engineer - Scene Understanding, Engineering , Boston, New Hampshire