Software Engineer - Singularity, Platform Infrastructure

Software Engineer - Singularity, Platform Infrastructure

The platform and infrastructure part of the new AI systems team is looking for engineers who are passionate about solving problems associated with large scale AI service’s performance, scalability, reliability, and efficiency. To build and manage one of the largest AI services in the industry.

Learn More  
Software Engineer – Singularity, Control Plane

Software Engineer – Singularity, Control Plane

As an engineer on the Singularity Control-plane team, you will be at the forefront of building a planet-scale, fully decentralized control plane, which is at the core of AI Supercomputer global distribution infrastructure.

Learn More  
Software Engineer - Singularity, Distributed Scheduler

Software Engineer - Singularity, Distributed Scheduler

You will be working on distributed scheduler for AI workloads, which is both aware of the AI workloads, the capabilities of the diverse accelerator resources and dynamic environments.

Learn More  
Software Engineer, Singularity, Storage/Distributed Cache

Software Engineer, Singularity, Storage/Distributed Cache

You will work on co-located, co-partitioned AI cache layer to help facilitate in speeding up the training job and increasing utilization of compute and hardware accelerator resources.

Learn More  
Software Engineer - Singularity, Distributed Training and Inferencing

Software Engineer - Singularity, Distributed Training and Inferencing

You will be working on devising native support for diverse distributed execution strategies in Singularity: data parallel, model parallel, general pipelining, 2D/3D parallelism (e.g., PipeDream, GPipe) techniques.

Learn More  
Software Engineer - Inferencing

Software Engineer - Inferencing

Do you want to work on building a planet-scale artificial intelligence (AI) system? The central Azure team is looking for truly exceptional software engineers to be part of a specialized startup team to build the next generation of cloud-based AI systems. Our work encompasses a wide array of hardware, compilers, distributed systems, operating systems, networking, and datacenter technologies. The platform and infrastructure part of the new AI systems team is looking for engineers who are passionate about solving problems associated with large scale AI service’s performance, scalability, reliability, and efficiency. To build and manage one of the largest AI services in the industry.

Learn More  
Software Engineer – Distributed Service

Software Engineer – Distributed Service

In this role you will be responsible for building the scheduling sub-system that is responsible for delivering on the SLAs for AI training and inferencing workloads. Specifically, you will be working on building the fault detection mechanisms, topology aware scheduling algorithms, checkpoint/restore, and elasticity capabilities across hardware and software stacks.

Learn More  
Software Engineer, Singularity, Data Plane/Compute

Software Engineer, Singularity, Data Plane/Compute

You will shape the future of the compute technology in AI supercomputer, including cluster availability orchestration, containerization/virtualization technology to bring distributed deep learning training and inferencing to life.

Learn More  
Software Engineer - Singularity, Data Plane, Networking

Software Engineer - Singularity, Data Plane, Networking

You will work on programming special high-bandwidth network to optimize for the performance of synchronous SGD based distributed model training. You will work on providing network integration with diverse accelerator types, including GPUs, with InfiniBand (IB).

Learn More  
Software Engineer - Singularity, GPU/FPGA/AI Accelerators

Software Engineer - Singularity, GPU/FPGA/AI Accelerators

You will be working on a consistent model to provide support for diverse set of accelerators (GPU/FPGA/AI Accelerators) and enabling provisioning and scaling accelerator devices based on AI workload needs.

Learn More  
Software Engineer - Billing Service

Software Engineer - Billing Service

You will be working on building a Billing Service for Singularity to provide support for diverse set of financial offers for AI training and inferencing workloads targeting diverse accelerators (GPU/FPGA/AI Accelerators) and enabling the lowest possible cost infrastructure for our customers based on AI workload needs.

Learn More  
Software Engineer - Singularity, PyTorch, TensorFlow Internals

Software Engineer - Singularity, PyTorch, TensorFlow Internals

You will work on deeply integrating frameworks like PyTorch and TensorFlow and within Singularity, and provide native support for elasticity, checkpointing, data loading and other optimizations as the model execution progresses.

Learn More