Crusoe Energy

Senior Engineering Manager

Dublin, IE
Python Kubernetes Kafka RabbitMQ Terraform Ansible Prometheus VictoriaMetrics Golang
Description

Senior Manager, Engineering

Department: Cloud Engineering

Location: Dublin - IE

Employment Type: FullTime

Crusoe is on a mission to accelerate the abundance of energy and intelligence. As the only vertically integrated AI infrastructure company built from the ground up, we own and operate each layer of the stack — from electrons to tokens — to power the world's most ambitious AI workloads. When you join Crusoe, you join a team that is building the future, faster.

We're in the midst of the greatest industrial revolution of our time. The demand for AI compute is boundless, and power is a bottleneck. We're solving that — with an energy-first approach that makes AI infrastructure better for the world and faster for the people innovating with AI.

We're looking for problem-solving, opportunity-finding teammates with a sense of urgency, who believe in the scale of our ambition and thrive on a path not fully paved — people who want to grow their careers alongside a team of experts across energy, manufacturing, data center construction, and cloud services.

If you want to do the most meaningful work of your career, help our customers and partners advance their AI strategies, and be part of a high-performing team that believes in each other, come build with us at Crusoe.

About This Role:

Crusoe is building the cloud infrastructure that powers the next generation of AI, and we're looking for a Senior Engineering Manager, Production Engineering to lead the team that keeps it running. This is a senior people management role reporting to the Director of Production Engineering — sitting at the intersection of deep technical leadership and organizational impact, with direct ownership over the reliability and operational health of Crusoe's production GPU infrastructure. You'll lead and develop a 24/7 team responsible for incident response, monitoring and alerting, automation, and continuous system improvement across a fast-scaling, high-stakes environment, while also shaping the broader strategy, culture, and structure of the function.

The ideal candidate is a seasoned technical leader who has built, scaled, and managed on-call operations teams in complex environments — someone who brings both rigor and vision to SLOs and postmortems, takes coaching and performance management seriously, and can drive alignment across engineering leadership on reliability strategy. If you're energized by the challenge of building a high-performing team while keeping complex systems reliable at scale, this role offers significant ownership and strategic impact at a critical moment in Crusoe's growth.

What You'll Be Working On:

  • Team Leadership & Development: Manage, coach, and grow a team of production engineers across shifts and time zones. Run structured 1:1s focused on career development, deliver candid performance feedback, and build a team culture grounded in ownership and continuous improvement.

  • Hiring & Onboarding: Partner with engineering leadership and recruiting to grow the team — owning the full hiring lifecycle from interview design to offer. Build and continuously improve onboarding and training programs that ramp new engineers quickly and effectively.

  • Incident Management: Serve as an escalation point for high-severity incidents. Lead postmortems with a focus on systemic fixes, ensure action items are tracked and completed, and drive down MTTR over time.

  • Reliability & SLO Ownership: Define, monitor, and report on SLIs, SLOs, and SLAs across Crusoe's production systems. Surface trends proactively and partner with engineering teams to address reliability gaps before they become customer issues.

  • Monitoring & Alerting: Oversee the design and maintenance of alerting and observability systems across bare-metal and cloud infrastructure, ensuring the team has the signal it needs to detect and respond to issues fast.

  • Automation & Toil Reduction: Identify and prioritize opportunities to automate repetitive operational work, improving team efficiency and system resilience over time.

  • Cross-Functional Partnership: Collaborate with infrastructure, platform engineering, product, and customer success teams to align on technical escalations, customer impact, and engineering priorities.

  • Operational Cadence: Own the team's day-to-day operational rhythm — stand-ups, on-call rotations, incident reviews, and sprint planning — ensuring the team runs smoothly across time zones.

What You'll Bring to the Team:

  • 6+ years of experience managing 24/7 technical operations or SRE teams in cloud or data center environments, including demonstrated success developing senior engineers, building organizational capability, and improving operational outcomes at scale.

  • Strong Linux and infrastructure fundamentals, including hands-on experience with containerization, Kubernetes, and virtualization in production environments.

  • Observability and monitoring expertise, including experience with Prometheus, VictoriaMetrics, and custom exporters — ideally against bare-metal endpoints.

  • Familiarity with messaging and workflow systems such as RabbitMQ, Kafka, NATS, or Temporal, and an understanding of how they function in distributed production environments.

  • Working proficiency in Golang or Python — enough to review production code, contribute meaningfully to technical design discussions, and support your engineers' work.

  • Demonstrated people management skills, including experience with structured performance management, individualized coaching, and building or improving onboarding and training programs.

  • SLA/SLO ownership experience — you've set them, measured them, reported on them, and held teams accountable to them in a customer-facing environment.

  • A track record of influencing cross-functional strategy and driving alignment across engineering leadership on operational priorities.

Bonus Points:

  • Experience with GPU infrastructure, HPC, or AI/ML cloud environments.

  • Familiarity with infrastructure-as-code tooling such as Terraform or Ansible.

  • Experience scaling an operations team and function through a period of rapid headcount or infrastructure growth.

  • Background in data center operations, including familiarity with physical infrastructure, hardware lifecycle, and network fundamentals.

Benefits:

Crusoe also offers a competitive benefits package designed to support financial security, health, and overall well-being, including pension contributions, private health and dental insurance, income protection, life assurance and more.

Compensation:

Compensation will be paid as salary or hourly. Compensation to be determined by the applicant’s education, experience, knowledge, skills, and abilities, as well as internal equity and alignment with market data.

Crusoe is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, disability, genetic information, pregnancy, citizenship, marital status, sex/gender, sexual preference/ orientation, gender identity, age, veteran status, national origin, or any other status protected by law or regulation.

Crusoe Energy
Crusoe Energy

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say