Gorgias empowers ecommerce brands to grow through AI-powered customer experience. We are the #1 CX platform in the industry, trusted by over 15,000 merchants worldwide – from small independent shops to some of the largest ecommerce brands in the world.
We offer the most integrations of any tool on Shopify (100+) and the ability to get setup fast, without the need for complex onboarding. Gorgias offers its users a unified platform to manage every aspect of their customer support on every channel.
We can automate 60% of a brand’s support so that agents can focus on high-value conversations and driving sales. Plus, we offer purpose-built marketing tools to help merchants convert more shoppers into customers, driving GMV.
About The SRE Team
We are seeking a highly skilled and experienced Senior Site Reliability Engineer (SRE) to join our team. As an SRE at Gorgias, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems, enabling the seamless delivery of our products and services.The SRE team at Gorgias maintains the core infrastructure and services that make up the heart of our product. We have the privilege to work with high throughput systems and TB-scale data stores serving billions of queries per day, most with sub millisecond response times.
We also design and maintain the software delivery stack, offering features such as metrics-based canary rollout strategies to all internal development teams.
We currently have a team of 4 Senior and Staff SREs operating together globally with aim to be 6 in the near term. We focus on scalable methods to provide the largest impact across the organization.
Some achievements we’re proud of:
Partitioned multi-TB tables in Postgres to reduce Vacuum time by 5x
For partitioning we studied the problem, the partitioning strategy, analyzed all queries to avoid bad surprises, utilized Debezium and Kafka to do a live copy and accomplished it with less than 20 mins maintenance window and no data loss
Split PostgreSQL connections proxy in multiple pools to guarantee quotas per service of our product, allowing sub-systems that heavily hit the database to be contained and not create a large incident blast radius
For connections proxying we had to go deeper into the BE to propose solutions, coded part of the fix in the backend, provided the path and helped teams migrate to the new methodology. In the end successfully eliminating incidents due to DB connections starvation
Worked with all product-engineering teams to accomplish SOC2 certification, ran a Hackerone program, refactored our whole incident management with Rootly for better visibility and resolution time, and improved our overall security posture
To keep the lights on the team is constantly working on upgrading our self-hosted Postgres and RabbitMQ, alongside other critical infrastructure components with minimal down time and high accuracy
What You Will Do:
Manage multi-TB PostgreSQL clusters in the public cloud, optimize parameters, storage settings and data structure
Operate RabbitMQ and Redis with tens of thousands of operations per second
Manage 10+ full featured GKE clusters worldwide, 10k+ Tenants
Adopt new stack of: Kafka, Debezium, Apache Flink
Facilitate rollout strategies at scale with Gitlab CI and ArgoCD
Roll out best practices around Kubernetes/Helm/Operators, SLIs/SLOs, Incident Management, Observability, Security, and Disaster Recovery to all Product-Engineering teams and drive adoption by them
Automate complex infrastructure pieces for our worldwide footprint with best practices IaC with TF, strong scripting with Python/Golang
What You Should Have:
Experience with cloud-native web systems at scale
Bachelor's degree in Computer Science or equivalent work experience.
5+ years experience as a Site Reliability Engineer or similar role, with a focus on maintaining high-performance, scalable, and reliable high-throughput web systems.
Proficiency in using Kubernetes for container orchestration and management.
5+ years experience with Cloud Providers (AWS, GCP) and a deep understanding of cloud services and architectures.
Proficient in scripting and programming languages such as Python, Bash, Go, or NodeJS.
Comfortable and confident in Linux systems and the command line.
Solid understanding of infrastructure as code (IaC) principles and experience with tools like Terraform.
Experience with continuous integration and deployment (CI/CD) pipelines.
Excellent problem-solving and troubleshooting skills.
Strong communication and collaboration skills with the ability to work effectively in a team environment.
Bonus Points If You Have:
Certification in Kubernetes (e.g., Certified Kubernetes Administrator - CKA).
Certification in a Cloud Provider platform (e.g., AWS Certified Solutions Architect, Google Cloud Professional Cloud Architect).
Experience in managing and optimizing PostgreSQL databases.
Company Benefits and Perks🏖️ 5-week vacation
🤕 Paid sick leave
🧸 Paid parental leave
💻 MacBook Pro
🏥 We provide private health insurance
🍽️ Monthly lunch stipend of $300 gross added to your salary
💆🏻♀️ Get up to €700 (gross) to set up your workstation at home (added to your first pay-check as an onboarding bonus)
📚 Get up to €1,500 of learning budget and a FitPass yearly membership. Take advantage of these resources to grow in your role and prioritize your personal development and wellness.
🥰 Every quarter, we organize an online company-wide summit to discuss where we’re going and strengthen social bonds. Once per year, we organize offsite team retreats and company retreats!
More cool things to know about Gorgias... 😁
Raised our Series C-2 for $29M in May 2024: Article Here ⬅️
We went from 0 to 15,000+ merchants using our platform since 2016
We have a 4.3 rating on Glassdoor & 4.7 Comparably culture score
What our customers are saying: apps.shopify.com/helpdesk#reviews
Other positions: gorgias.com/about-us/jobs
Discover the Gorgias Platform
Learn about our Compensation Policy
Diversity, Equity, and Inclusion at Gorgias
At Gorgias, we’re dedicated to creating a diverse, inclusive, and equitable workplace where everyone is valued. We provide equal opportunities without discrimination based on race, gender, age, disability, or any characteristic protected by law.
We also recognize that individuals from diverse backgrounds—especially women and underrepresented groups—may hesitate to apply if they don’t meet every requirement. If this role excites you and you’re eager to grow, we strongly encourage you to apply, even if you don’t check every box. You might bring something unique and valuable that we didn’t even know we needed.
If you need accommodations to participate in the application or interview process, perform essential job functions, or access other employment benefits, please contact us at accommodation@gorgias.com. Let’s grow together!
Other Jobs from Gorgias
Senior Site Reliability Engineer
Infrastructure Security Engineer
Infrastructure Security Engineer
Senior Backend Software Developer
Senior Backend Software Developer
Similar Jobs
Senior Site Reliability Engineer
Senior Backend Software Engineer / SMTS - Distributed Systems - Hyderabad
Senior Site Reliability Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say