Member of Technical Staff - Efficient ML
Embedding Vc
San Francisco, California, United States
permanent
Machine LearningDataloadersGradient CheckpointingDistributed TrainingFSDPZeROTensor ParallelismNCCL TuningGPU PerformanceNsight ProfilingCUDA KernelsFlash-Attention
Introducing Moonlake, AI for creating world simulations. Scope of Work Training efficiency • Dataloaders, fusion, activation remat, gradient checkpointing. • FSDP/ZeRO/tensor+pipeline parallel; NC...
January 15, 2026
View Details