Growe

System Reliability Engineer/DevOps (Remote)

Remote
AWS EC2 ECS EKS RDS DocumentDB ElastiCache Keyspaces S3 EBS VPC Route53 KMS ACM CloudWatch Terraform Terragrunt Atlantis GitLab CI FluxCD Argo Rollouts Ansible Python Bash Docker Kubernetes Helm KEDA VPA Karpenter External-DNS ingress-nginx aws-alb-controller ebs-csi-driver Grafana VictoriaMetrics Tempo Pingdom PagerDuty VMAlert Alertmanager Loki OpenSearch Vector Agent Network Firewall Transit Gateway Site-to-Site VPN Vault SOPS Cloudflare KubeCost
Description

System Reliability Engineer/DevOps

Location: Anywhere

Department: Cloud Infrastructure Operations

Growe welcomes those who are excited to:
  • Ensure availability, performance, and scalability of infrastructure and services through monitoring, automation, and operational best practices;

  • Lead incident response, perform root cause analysis, and implement recovery and long-term fixes;

  • Manage infrastructure using Terraform, Terragrunt, and automation tools for consistency and repeatability;

  • Implement and maintain metrics, logs, and tracing solutions (Prometheus, Grafana, Loki, VictoriaMetrics, CloudWatch) to ensure system visibility;

  • Identify bottlenecks, tune systems, and improve infrastructure performance;

  • Monitor resources, forecast growth, and implement scaling strategies;

  • Integrate security best practices into IaC, CI/CD pipelines, and deployments;

  • Support vulnerability management;

  • Participate in 24/7 rotations (once a week) for timely resolution of critical incidents;

  • Work with DevOps, PRE, development, and security teams to improve reliability and design resilient systems;

  • Maintain operational runbooks, incident reports, and system documentation.

We need your professional experience:
  • 3+ years in a DevOps, SRE, or related role;

  • Strong hands-on experience with AWS services including EC2, ECS, EKS, RDS, DocumentDB, ElastiCache, Keyspaces, S3, EBS, VPC, Route53, KMS, ACM, and CloudWatch;

  • Proficiency with Terraform, Terragrunt, and Atlantis for reproducible and version-controlled infrastructure;

  • Experience with GitLab CI, FluxCD, Argo Rollouts, and automation tools (Ansible, Python, Bash);

  • Solid experience with Docker, Kubernetes (AWS EKS), and Helm (including custom templates, ChartMuseum);

  • Familiarity with cluster add-ons such as KEDA, VPA, Karpenter, External-DNS, ingress-nginx, aws-alb-controller, and ebs-csi-driver;

  • Hands-on experience with Grafana, VictoriaMetrics stack, Tempo, metrics exporters, Pingdom, AWS CloudWatch, and alerting systems like PagerDuty, VMAlert, and Alertmanager;

  • Proficiency with Grafana Loki, OpenSearch, and Vector Agent for centralized logging;

  • Strong understanding of networking concepts, AWS networking (VPC, Network Firewall, Transit Gateway, Site-to-Site VPN), identity and access management, certificate management (ACM, Vault, SOPS), and application security best practices;

  • Familiarity with Cloudflare services, including caching, DNS, and Workers;

  • Exposure to AWS Cost Explorer, KubeCost, and custom cost export tools;

  • Certifications: AWS, Terraform, Kubernetes, or Helm are a plus.

We appreciate if you have those personal features:
  • Problem-Solving Mindset: Approaches complex issues methodically and finds practical solutions under pressure;

  • Analytical Thinking: Able to interpret metrics, logs, and system behavior to make informed decisions;

  • Attention to Details: Ensures accuracy in infrastructure changes, configurations, and deployment processes;

  • Adaptability: Comfortable learning new tools, technologies, and adjusting to changing environments;

  • Collaboration & Teamwork: Works effectively with cross-functional teams and communicates clearly;

  • Ownership & Responsibility: Takes accountability for tasks, incidents, and service reliability;

  • Continuous Learning: Keeps up-to-date with DevOps, SRE, cloud, and security best practices;

  • Effective Communication: Can explain technical concepts clearly to both technical and non-technical stakeholders.

We are seeking those who align with our core values:
  • GROWE TOGETHER: Our team is our main asset. We work together and support each other to achieve our common goals;

  • DRIVE RESULT OVER PROCESS: We set ambitious, clear, measurable goals in line with our strategy and driving Growe to success;

  • BE READY FOR CHANGE: We see challenges as opportunities to grow and evolve. We adapt today to win tomorrow.

Growe
Growe

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say