Wand

Head of Site Reliability Engineering

Remote
AWS Azure Terraform Kubernetes Python Bash SQL CI/CD MLOps AI Deep Learning
Description

Head of SRE

Location: Europe / Remote

Description

Build the Future Workforce

Wand turns AI into labor. It enables humans and AI agents to operate together as a unified, hybrid workforce, with comprehensive management and oversight. And it’s already operating at scale inside some of the world’s largest organizations.

Wand built the world’s first Agentic Labor Infrastructure enabling governments and global enterprises to create, manage, and scale digital workforces.

Our mission is to integrate agent ecosystems into the core of work and business, unlocking a generational leap in the global economy. We’re building the infrastructure that lets humans and AI agents operate together safely, transparently, and at scale.

Join Wand in leading the Agentic Shift

Wand is building a high-performing global team who take full ownership of what they build. We lead by example, move fast, make data-aware decisions, and continuously push for more- always with a focus on delivering real value to customers.

You would be joining a world-class team that combines deep research expertise and real-world product execution, with experience spanning Deepmind, Google, Amazon, Miro, Elise AI, IBM and Accern.



Requirements

Position Summary

We are hiring for a hands-on Head of SRE to establish, lead, and scale our Site Reliability Engineering function. This role combines strategic ownership with deep technical execution.

You will be responsible for defining reliability standards, building operational processes, and ensuring production stability, while actively architecting infrastructure, improving automation, and embedding SRE best practices across the engineering organisation.

There is significant scope to review, improve, and rebuild our systems, infrastructure and processes where necessary. You will be instrumental in designing, developing, and maintaining scalable backend systems, ensuring our AI products meet the highest standards.

You will also become part of the product-engineering leadership team, contribute to scaling the organization, and report directly to the CPTO.



Responsibilities

  • Own and lead all SRE-related strategy, standards, and execution. Embed SRE culture and operational excellence across engineering teams.
  • Review the current infrastructure and operational model; redesign and rebuild where needed.
  • Architect, deploy, and maintain scalable, secure production environments.
  • Define and implement SLIs, SLOs, and uptime targets.
  • Establish robust monitoring, alerting, and observability practices.
  • Design and implement incident management, RCA and postmortem processes.
  • Build and manage sustainable on-call frameworks and escalation models.
  • Automate the software delivery lifecycle to improve release predictability and safety.
  • Create reproducible environments and IaaC provisioning templates.
  • Improve system performance, availability, and reliability.
  • Support and productionise data platforms and ML workloads.
  • Partner closely with QA and Engineering leadership to improve release quality and stability.
  • Ensure infrastructure meets enterprise-grade security and regulatory requirements.
  • Hire, manage, and mentor a team of SRE engineers.


Key Requirements

  • Proven hands-on experience in Site Reliability Engineering, Production Engineering, or a similar role.
  • Strong hands-on expertise in cloud infrastructure (AWS or Azure preferred), IaaC (Terraform) and Kubernetes. 
  • Experience building or maturing SRE practices within an organisation.
  • Demonstrated ability to improve uptime, reliability, and operational processes.
  • Deep understanding of CI/CD, dev exp, infrastructure-as-code, and automation.
  • Experience designing on-call processes and incident response frameworks.
  • Experience managing at least one team of SRE engineers.
  • Strong communication skills, with the ability to influence across teams.
  • Experience supporting data platforms and ML systems in production environments.
  • MLOps experience (model deployment, monitoring, retraining workflows).


Preferred Experience

  • Background in large-scale global B2B/B2C products.
  • Background in enterprise environments with security and compliance requirements.
  • Expertise in ML, AI, LLMs.
  • Experience implementing regulatory controls within cloud infrastructure.
  • Experience evaluating and managing infrastructure vendors and tooling.
  • Experience scaling systems in high-growth environments.
  • Experience in collaborating with large scale enterprise customers to deploy and operate environments within their accounts and VPCs. 


Personal Characteristics

  • Practical and hands-on; willing to lead from the front.
  • Strong operational mindset with clear opinions on best practices.
  • Structured thinker who can build processes from ambiguity.
  • High ownership mentality and accountability.
  • Learning-oriented with a continuous improvement mindset.
  • Excellent communication and interpersonal skills.
  • Continuous drive for improvement and innovation.
Wand
Wand

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say