EverOps

Lead Site Reliability Engineer, IT Support Automation

Remote
Python Go API OAuth SAML AWS GCP Azure Kubernetes Terraform LLM
Description

Lead Site Reliability Engineer – IT Support Automation

Department: DevOps

Location: HQ

Employment Type: FullTime

Overview

As technology organizations scale, so does operational friction. IT support teams become overloaded with repetitive tickets — account lockouts, access requests, provisioning tasks, and standard “ask IT” issues that drain time and attention from higher-value work.

EverOps partners directly with enterprise engineering and IT organizations to solve complex operational challenges from within their environments. We don’t patch symptoms — we eliminate root causes.

We are seeking a Lead Site Reliability Engineer to own and execute a comprehensive IT support automation strategy designed to significantly reduce ticket volume and human intervention.

The Challenge

This is not a reactive support role.

This is a systems-level engineering role focused on:

  • Eliminating tickets before they are created

  • Automating resolution paths when tickets do occur

  • Building durable automation frameworks across SaaS and internal platforms

  • Removing systemic friction across the IT lifecycle

You will operate heavily within the IT support domain, addressing areas such as:

  • Account lockouts and access management

  • Provisioning and deprovisioning workflows

  • Device and asset lifecycle management

  • Standard internal IT requests

  • SaaS integrations and workflow orchestration

The expectation is leadership-level ownership. You will define the automation roadmap, architect solutions, and drive initiatives from intake through deployment with measurable outcomes.

The Mission

As a Lead SRE, your mission is to:

  • Reduce human intervention across IT support workflows

  • Build automation systems that scale without increasing headcount

  • Architect reliable, observable, production-grade automation services

  • Establish engineering standards for automation development

  • Mentor junior engineers while maintaining direct ownership of delivery

Success is measured in outcomes:

  • Reduced ticket creation rates

  • Increased fully automated resolution percentages

  • Improved user satisfaction while lowering operational burden

This role requires deep technical capability combined with strong execution discipline and cross-functional influence.

What You’ll Do

1. Root-Cause Ticket Elimination

  • Analyze ticket trends and identify systemic failure patterns

  • Redesign workflows to remove recurring pain points

  • Replace reactive fixes with preventative engineering solutions

  • Partner with IT and engineering stakeholders to prioritize high-leverage automation opportunities

2. End-to-End Automation Architecture

  • Design and implement automation workflows across multiple SaaS platforms

  • Integrate with third-party and internal APIs (e.g., identity providers, collaboration tools, asset systems, ticketing platforms)

  • Architect resilient API integrations including:

    • Authentication & authorization flows (OAuth2, SAML, token management)

    • Rate limiting and retry strategies

    • Error handling and observability

  • Build self-service systems that allow users to resolve common requests without human escalation

3. Custom Service & Tooling Development

When no off-the-shelf solution exists, you will:

  • Build lightweight microservices or serverless functions (Python or Go preferred)

  • Develop internal middleware, proxies, or orchestration services

  • Create background automation jobs (cron-style processes)

  • Containerize and deploy services using modern DevOps practices

You will make thoughtful build-vs-buy decisions, balancing speed, maintainability, and long-term scalability.

4. Reliability, Observability & Production Standards

Automation must be as reliable as any production system.

You will:

  • Implement Infrastructure as Code (Terraform, Pulumi, or similar)

  • Maintain CI/CD pipelines for automation services

  • Design monitoring, logging, and alerting frameworks

  • Define SLIs/SLOs to measure automation reliability

  • Ensure automation services are secure, observable, and resilient

This is not scripting — this is platform-grade engineering.

5. Lead-Level Ownership & Execution

This role requires operating as a single-threaded owner for major initiatives.

You will:

  • Define solution architecture from concept to deployment

  • Set timelines and milestones autonomously

  • Conduct feasibility validation in development environments

  • Communicate proactively with stakeholders

  • Re-scope tactically to maintain forward momentum when blocked

  • Deliver measurable impact — not just activity

You are expected to think systemically, move with urgency, and drive initiatives to completion without requiring micro-management.

You Have

Experience

  • 8+ years in SRE, Platform Engineering, DevOps, or Automation Engineering

  • Proven experience designing enterprise-scale automation systems

  • Strong exposure to IT support domains (access, provisioning, identity, device lifecycle, SaaS operations)

Technical Strength

API & Integration Expertise

  • Deep experience designing and consuming REST APIs

  • Strong understanding of authentication and authorization patterns

  • Experience orchestrating workflows across multiple SaaS platforms

Programming & Automation

  • Strong proficiency in Python or Go

  • Experience building production-ready services

  • Advanced scripting for orchestration and automation logic

Cloud & Infrastructure

  • Strong familiarity with at least one major cloud provider (AWS, GCP, or Azure)

  • Containerization and Kubernetes exposure

  • Infrastructure as Code experience

Systems Thinking

  • Networking fundamentals

  • Identity and access concepts

  • Understanding of asset lifecycle management

Leadership & Communication

  • Experience leading technical initiatives from idea through deployment

  • Ability to mentor junior engineers

  • Strong written and verbal communication skills

  • Comfortable influencing cross-functional stakeholders

  • Data-driven decision-making approach

You think in terms of leverage, scale, and long-term impact.

What Success Looks Like

Within 6–12 months, you will have:

  • Eliminated entire categories of recurring IT tickets

  • Implemented durable automation frameworks across core IT workflows

  • Increased automated resolution rates quarter over quarter

  • Reduced manual provisioning and access overhead

  • Established scalable, observable automation systems that continue to compound value

Your impact will be visible in metrics — not anecdotes.

Nice to Have

  • Experience integrating AI/LLM capabilities into workflow automation

  • Familiarity with ITSM frameworks

  • Background building internal self-service platforms

  • Experience presenting technical strategy to senior leadership

  • Experience operating in high-scale, compliance-sensitive environments

Benefits

  • 100% Remote Workplace

  • Unlimited Paid Time Off

  • Equity – Become a true owner of the company

  • 401K with company contribution and sponsored healthcare

  • Professional Growth – Access to training and certification programs

EverOps
EverOps

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say