Qode

Site Reliability Engineer Architect

Austin, TX Texas
AWS GCP Azure Kubernetes Terraform Ansible Python Go Bash Prometheus Grafana Datadog ELK Splunk Kafka Spark
Description

SRE Architect

Location: Texas, Texas, United States

Workplace: hybrid

Employment Type: full

Description

Job Description: SRE Architect

πŸ“ Location: Austin, TX Hybrid)
πŸ•’ Employment Type: Full-Time
🎯 Experience Level: Architect

Role Overview

We are seeking an experienced Site Reliability Engineer (SRE) Architect to design, build, and scale highly reliable, resilient, and observable systems. This role is ideal for a hands-on architect who can define SRE strategy, influence engineering practices, and partner closely with development, platform, and security teams.
The position requires onsite or hybrid presence in Austin, TX, with collaboration across distributed teams.

Key Responsibilities

Architecture & Reliability

  • Define and own the SRE architecture strategy, including reliability, availability, scalability, and performance standards.
  • Design resilient, fault-tolerant systems for cloud-native and hybrid environments.
  • Establish and govern SLIs, SLOs, and error budgets across platforms and services.
  • Lead capacity planning, resilience testing, and chaos engineering initiatives.

Platform & Cloud Engineering

  • Architect and operate platforms on AWS/GCP/Azure (multi-cloud or hybrid setups).
  • Design and manage Kubernetes-based platforms (EKS/GKE/AKS).
  • Drive Infrastructure as Code (IaC) practices using Terraform, Ansible, or similar tools.
  • Standardize environments, deployment patterns, and runtime configurations.

Operational Excellence

  • Build and maintain observability frameworks using tools such as Prometheus, Grafana, Datadog, ELK, Splunk, or equivalent.
  • Lead incident management, root cause analysis (RCA), and post-incident reviews.
  • Reduce MTTR through automation, tooling, and process improvements.
  • Participate in and improve on-call models, escalation policies, and runbooks.

DevOps & Automation

  • Partner with engineering teams to embed CI/CD best practices.
  • Drive automation across provisioning, deployments, testing, and operations.
  • Improve system reliability by eliminating manual operational toil.

Security & Governance

  • Architect secure platforms aligned with enterprise security standards.
  • Implement best practices for secrets management, access control, compliance, and audits.
  • Collaborate with Security and Compliance teams on governance models.

Leadership & Collaboration

  • Act as a technical mentor and thought leader within SRE and platform teams.
  • Influence engineering culture toward reliability-focused design.
  • Partner with product, application, and infrastructure teams to deliver business outcomes.

Required Qualifications

  • 10+ years of experience in SRE, DevOps, Platform Engineering, or Systems Architecture.
  • Strong experience designing and operating large-scale distributed systems.
  • Deep hands-on expertise with cloud platforms (AWS/GCP/Azure).
  • Advanced experience with Kubernetes and containerized workloads.
  • Strong knowledge of Linux internals, networking, storage, and system performance.
  • Proven experience implementing IaC and configuration management.
  • Proficiency in one or more programming/scripting languages (Python, Go, Bash, etc.).
  • Strong understanding of observability, monitoring, and alerting strategies.
  • Excellent communication and stakeholder management skills.

Preferred Qualifications

  • Experience in multi-cloud or regulated environments.
  • Background supporting high-throughput, high-availability, or data-intensive systems.
  • Experience with Kafka, Spark, or large-scale data platforms.
  • Exposure to fintech, healthcare, enterprise SaaS, or hyperscale platforms.
  • Prior experience as Principal Engineer, Architect, or Lead SRE.

Work Model

  • Hybrid / Onsite role based in Austin, TX
  • Requires regular collaboration with local and global teams

Why Join Us

  • Architect systems at enterprise scale
  • Influence platform and reliability strategy across teams
  • Work with modern cloud-native technologies
  • High-impact role with strong visibility and ownership
Qode
Qode

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

πŸ₯³πŸ₯³πŸ₯³ 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. πŸ› οΈ
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. πŸš€
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. πŸ“…

What Fellow Engineers Say