Qwilt

SRE Engineer

US
Docker AWS GCP Bash Redis MySQL MongoDB Kubernetes Puppet Terraform Shell Ansible Python PostgreSQL SQL
Description

Senior Site Reliability Engineer (SRE) - Infrastructure

About the team:

Our globally distributed team of senior engineers is dedicated to managing and optimizing infrastructure for multi-cloud production services. We specialize in infrastructure monitoring, automation, tools management, and deployment across various cloud platforms. As part of our responsibilities, we actively participate in follow-the-sun on-call rotations. We operate in a dynamic, multi-tasking environment that requires constant learning and adaptation. By ensuring the reliability of our systems, we directly impact the success of our business.


Tech Stack:

●     Version Control: Gitlab

●     Continuous Delivery: ArgoCD

●     Container Orchestration: Kubernetes

●     Configuration Management: Puppet, Ansible

●     Automation: Rundeck

●     Monitoring & Alerting: InfluxDB, Prometheus, Thanos, Grafana, Zabbix

●     Logging: Coralogix

●     Infrastructure as Code: Terraform

●     Caching: Memcached

●     Scripting: Shell, Python

●     Cloud Platforms: AWS, GCP

 

About the role:

As a Senior Site Reliability Engineer (SRE) specializing in Infrastructure, you will play a critical role in managing and optimizing our multi-cloud production services. You will be responsible for infrastructure monitoring, automation, tools management, and production stability. This role requires active participation in follow-the-sun on-call rotations to ensure the reliability and availability of our services.



What You Will Do:

●     Manage and optimize multi-cloud production services infrastructure.

●     Implement and maintain infrastructure monitoring solutions using Prometheus, Thanos, Grafana, and other tools.

●     Develop automation scripts in Bash and Python to streamline operational tasks.

●     Manage tools such as Puppet, Ansible, Rundeck, Teleport and more.

●     Collaborate with cross-functional teams to enhance system reliability and performance.

●     Contribute to the architecture and scalability of our systems.

●     Participate in follow-the-sun on-call rotation to respond to incidents and ensure system availability.

●     Troubleshoot and resolve infrastructure issues across our cloud environments.

●     Drive best practices for reliability, scalability, and observability.

●     Mentor and guide other teams in best practices and technologies.

●     Contribute to the design and implementation of scalable, reliable, and secure solutions.


 

Required Experience:

●     Minimum of 2 years of hands-on experience with Kubernetes.

●     Minimum of 5 years experience as SRE / DevOps / Cloud or System Engineer.

●     At least 1 year of experience working with cloud environments (AWS, Google Cloud Platform).

●     Strong understanding of infrastructure monitoring tools such as Prometheus (Mimir/Thanos/Cortex), including deployment and management.

●     Proficiency in Bash and Python scripting for automation tasks.

●     Experience with SQL and NoSQL databases, such as MySQL, PostgreSQL and MongoDB.

●     Familiarity with in-memory key-value stores such as Redis and Memcached.

●     Solid understanding of networking and web applications, with emphasis on TCP/IP stack, SSL/TLS, and HTTP protocols.

 

 

Additional Skills (Preferred):

●     Experience with Terraform for infrastructure as code.

●     Knowledge of containerization technologies such as Docker.

●     Understanding of CI/CD pipelines.

●     Familiarity with logging and monitoring tools like Coralogix.

 

Why Join Us:

If you are passionate about infrastructure reliability, and automation, and thrive in a fast-paced environment, we would love to hear from you. Join us in delivering the best experience for our customers and ensuring the success of our business. Apply now to be part of our innovative team!

 

●     Opportunity to work with a globally distributed team of senior engineers.

●     Dynamic and challenging environment that encourages constant learning and growth.

●     Direct impact on the reliability and success of our business.

●     Exposure to cutting-edge technologies and cloud platforms.

 

 

 

Qwilt
Qwilt

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say