CoreWeave

Manager, Cloud Operations Engineering

Remote Brooklyn, NY
USD 180k - 200k
Spring Machine Learning Streaming Kubernetes
Search for More Jobs Talk to a recruiter now 💪
Description

CoreWeave is a specialized cloud provider, delivering a massive scale of GPU compute resources on top of the industry’s fastest and most flexible infrastructure. CoreWeave builds cloud solutions for compute intensive use cases — VFX and rendering, machine learning and AI, batch processing, and Pixel Streaming — that are up to 35 times faster and 80% less expensive than the large, generalized public clouds. Learn more at www.coreweave.com.

About the role:

The Cloud Operations Team is the heart of CoreWeave’s operational practice.  This team responds to performance and availability issues across the CoreWeave cloud, bridging the gap between Customer Support and internal Service Owning teams.  Working in shifts ensuring 24x7 coverage, the team develops proactive health monitoring, triage alerts and incidents serving in the commander role during Priority Incident events, and participates in ongoing analysis and reliability improvement practices.   

Collaborating across development and engineering, this team operates horizontally and vertically within the CoreWeave ecosystem to root out problems, initiate and coordinate responses, and drive lower MTTR and MTTD scores.

The newly formed team is staffed with resources who have broad technology and troubleshooting skills and are actively expanding their knowledge in critical areas such as networking, storage, Kubernetes, automation, and observability. You will bootstrap the team’s processes and procedures and be their direct Manager. 

As the people leader for this team of 8 Operations Engineers, you will facilitate and empower their success.  Drawing on your experience in Cloud Operations, you understand deeply the importance of process, documentation and automation. You strive for continual improvement.  You will maintain a close working relationship with each of your team members through regular 1:1s focusing on the ‘whole engineer’ guiding them in their skills and career development at CoreWeave.  Resources on your team are likely to mature into strong individual contributors to peer engineering teams across the organization and you will help them prepare while simultaneously providing exceptional support to those same teams.  

As Manager of the Cloud Operations Team you will:

  • Grow, change, invest in your teammates, be invested-in, share your ideas, listen to others, be curious, have fun, and above all, be yourself.
  • Learn and navigate the tools, systems and processes that enable the AI cloud.
  • Bootstrap the team’s operational processes and road map key project work and tooling requirements for the team’s success.
  • Own staffing, scheduling and HR responsibilities.
  • Develop and lead team cadence and planning sessions in conjunction with our Technical Project Manager.
  • Develop internal processes, procedures, and documentation to ensure efficient management of the team’s workload.
  • Track and report on key metrics that represent the team’s improvement and impact.
  • Act as the Sr. Incident Commander, and develop the team’s ability to efficiently operate Major Incidents.
  • Participate as a key member of the enterprise ITSM cadence, reporting on incident trends, durations (MTTR, MTTD etc.), problems, and Incident Reviews.
  • Own the Post Incident Review process.
  • Continually improve our incident response process with the goal of iteratively reducing MTTR through all reasonable methods (tooling, process, automation etc.).
  • Partner across service owners, SRE, Customer Support, to ensure process alignment, knowledge sharing and shared responsibility regarding Incident Management, Post Incident Reviews, Production Readiness Assessments etc.

Wondering if you’re a good fit? We believe in investing in our people, and value candidates who can bring their own diversified experiences to our teams – even if you aren't a 100% skill or experience match. Here are some qualities we’ve found compatible with our team. If a portion of this resonates with you, we’d love to talk. 

  • You come with your own philosophies and strategies, are adaptable to new information, and freely provide feedback, coaching, and being an active participant in improving how the team functions.
  • You have experience with business process development and can see where communication breakdowns are likely to occur.
  • You are committed to understanding the needs of others, and how you can effectively lever your own talents to ensure collective success.
  • You are comfortable using observability data to visualize service health, and triangulate proximate cause of performance and availability issues.
  • You are comfortable making sense of complex environments and leading others through troubleshooting without actively fixing things yourself.
  • You can lead when there’s ambiguity, and following when engineers lead.
  • You have experience in a support capacity and/or a broad understanding of modern applications and infrastructure.
  • You are comfortable managing communication and coordinating multiple engineers during an incident.
  • You have a desire to learn or have experience with process automation.
  • You have a customer first mindset and bring empathy for the customer as well as the engineering team who’s tasked with solving complex problems.
  • You’re excited to join a team with diverse perspectives and backgrounds that believe in tackling challenges, growing hand in hand, and winning together.

Our compensation reflects the cost of labor across several US geographic markets. The base pay for this position ranges from $180,000 to $200,000/year. Pay is based on a number of factors including market location and may vary depending on job-related knowledge, skills, and experience.

Hybrid Workplace

If you reside within a 30-mile radius of our New Jersey, New York, or Philadelphia offices, we're excited for you to join us at the office at least three times a week, recognizing the significance we place on fostering connections, collaboration, and creativity within our office culture. Our commitment to operating as a hybrid workplace underscores our dedication to enabling our employees to tailor their work-life balance to their individual preferences.

Why CoreWeave?

At CoreWeave, we work hard, have fun, and move fast!  We’re in an exciting stage of hyper-growth that you will not want to miss out on. We’re not afraid of a little chaos, and we’re constantly learning. Our team cares deeply about how we build our product and how we work together, which is represented through our core values: 

  • Be Curious at your Core
  • Act like an Owner
  • Empower Employees
  • Deliver Best In-Class Client Experience 
  • Achieve More Together

We support and encourage an entrepreneurial outlook and independent thinking. We foster an environment that encourages collaboration and provides the opportunity to develop innovative solutions to complex problems. As we get set for take off, the growth opportunities within the organization are constantly expanding. You will be surrounded by some of the best talent in the industry, who will want to learn from you, too. Come join us! 

Benefits

We offer a competitive salary and benefits, including:

  • Medical, dental and vision insurance - 100% paid for the employee
  • Company paid Life Insurance 
  • Voluntary supplemental life insurance 
  • Short and long-term disability insurance 
  • Flexible Spending Account
  • Tuition Reimbursement 
  • Mental Wellness Benefits through Spring Health 
  • Family-Forming support provided by Carrot
  • Paid Parental Leave 
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our offices
  • Weekly massages in NJ office
  • A casual work environment
  • Work culture focused on innovative disruption

California Consumer Privacy Act - California applicants only

CoreWeave is an equal opportunity employer, committed to our diversity and inclusiveness. We will consider all qualified applicants without regard to race, color, nationality, gender, gender identity or expression, sexual orientation, religion, disability or age.

 

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 307 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers