Mission Summary

At Motional, data play a critical role in fueling our ML-centered autonomous driving vehicle. Our robo-taxi fleet collects petabytes of data on the road every day – the Data Mining team is mining & filtering the massive influx of fleet data by developing billion-scale data workflows and state-of-the-art mining algorithms. Through our mining and learning frameworks we continuously improve the on-road performance of ML products for perception, prediction & planning with every mile driven.

We mine for model errors, anomalies, rare objects & long-tail driving scenarios across millions of driving hours – these are used for laser-focused ML model training and continuous edge case validation. We are looking for an engineer to spearhead new mining strategies & workflows and help deliver high-quality data that improve our core ML products.

What you'll be doing:

• Spearhead the development of cutting-edge data products by adapting and extending Vision-Language Models (VLMs) and other multimodal foundation models. This includes applying advanced techniques like fine-tuning, RAG, in-context learning, continual pre-training, and knowledge distillation.

• Design and curate high-quality multimodal datasets crucial for training and evaluating multimodal foundation models. This includes developing innovative strategies for data curation, dataset creation, and synthetic data generation to optimize multimodal foundation models for long-tail event mining.

• Drive the in-depth analysis of multimodal foundation models' performance, generalization, and robustness in diverse real-world settings

What we’re looking for:

• MS/PhD in computer science or related fields with a strong emphasis on multimodal foundation models

• Strong publication record in premier conferences (e.g., CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR) demonstrating significant contributions to the field of vision-language understanding or multimodal foundation models

• Proficiency in Python and deep learning frameworks such as PyTorch, with a demonstrated ability to write clean, efficient, and maintainable code

Bonus points (not required):

• Experience in the application of Vision-Language Models (VLMs) or other multimodal foundation models to data mining in real-world settings

• Experience in production deployment of Vision-Language Models (VLMs) or other multimodal foundation models for real-world applications (e.g., image/video captioning, open-vocabulary image/video searching)

• Experience with data from diverse sensor modalities (e.g., camera, lidar, radar)

• Experience in applied machine learning for autonomous driving

The salary range for this role is an estimate based on a wide range of compensation factors including but not limited to specific skills, experience and expertise, role location, certifications, licenses, and business needs. The estimated compensation range listed in this job posting reflects base salary only. This role may include additional forms of compensation such as a bonus or company equity. The recruiter assigned to this role can share more information about the specific compensation and benefit details associated with this role during the hiring process.

Candidates for certain positions are eligible to participate in Motional’s benefits program. Motional’s benefits include but are not limited to medical, dental, vision, 401k with a company match, health saving accounts, life insurance, pet insurance, and more.

Salary Range
$175,000—$234,000 USD

Motional is a driverless technology company making autonomous vehicles a safe, reliable, and accessible reality. We’re driven by something more.

Our journey is always people first.

We aren't just developing driverless cars; we're creating safer roadways, more equitable transportation options, and making our communities better places to live, work, and connect. Our team is made up of engineers, researchers, innovators, dreamers and doers, who are creating a technology with the potential to transform the way we move.

Higher purpose, greater impact.

We’re creating first-of-its-kind technology that will transform transportation. To do so successfully, we must design for everyone in our cities and on our roads. We believe in building a great place to work through a progressive, global culture that is diverse, inclusive, and ensures people feel valued at every level of the organization. Diversity helps us to see the world differently; it’s not only good for our business, it’s the right thing to do.

Scale up, not starting up.

Our team is behind some of the industry's largest leaps forward, including the first fully-autonomous cross-country drive in the U.S, the launch of the world's first robotaxi pilot, and operation of the world's longest-standing public robotaxi fleet. We’re driven to scale; we’re moving towards commercialization of our technology, and we need team members who are ready to embrace change and challenges.

Formed as a joint venture between Hyundai Motor Group and Aptiv, Motional is fundamentally changing how people move through their lives. Headquartered in Boston, Motional has operations in the U.S and Asia. For more information, visit www.Motional.com and follow us on Twitter, LinkedIn, Instagram and YouTube.

Motional AD Inc. is an EOE. We celebrate diversity and are committed to creating an inclusive environment for all employees. To comply with Federal Law, we participate in E-Verify. All newly-hired employees are queried through this electronic system established by the DHS and the SSA to verify their identity and employment eligibility.

Senior Software Engineer, Vision Language Models

Interested in this position?

Required Skills

Job Description

Why Apply Through MisuJob?

Frequently Asked Questions

How do I apply for this position?

Is MisuJob free for job seekers?

How does AI matching work?

Can I apply to jobs in other countries?

Ready to Apply?