Description

What we do

At Perlego, there are over 100 of us working hard to make education accessible to all. In this digital age, we believe that anyone should be able to learn anything at any time. Knowledge should be more accessible, not locked behind sky-high price tags.

Over the past 5 years, our goal has been to support students across the UK & Europe to access quality books. The next stage of Perlego is twofold: 1) expand our support to students globally, and 2) build a product that goes beyond the book, a platform that helps students study smarter and more effectively.

What we're looking for:

We are looking for an experienced Site Reliability Engineer (SRE) with a strong background in AWS services and monitoring tools. In this role, you will ensure the availability and reliability of our services, especially during out-of-office hours, while most of the team is based in Europe and India. You will be integral to swiftly addressing issues, resolving incidents independently, and thriving in a fast-paced environment.

How we collaborate:

Our organization operates across multiple time zones, with teams based in across Europe. As an SRE, you will provide critical support during off-hours, working autonomously to resolve issues while collaborating closely with our teams to ensure continuous service availability. You will be part of a global team, supporting cloud infrastructure and platform initiatives.

What you’ll do:

As a Site Reliability Engineer, your main focus will be to ensure our services remain highly available and performant. Key responsibilities include:

Monitoring & Incident Management:

Monitor and manage platform activity using tools like Datadog, Prometheus, Grafana, or AWS CloudWatch.
Respond quickly to alerts and incidents, independently resolving issues and ensuring service uptime during off-peak hours.
Conduct post-incident reviews and help improve system resiliency through automation and monitoring enhancements.

Cloud Infrastructure Management:

Manage and support AWS infrastructure, focusing on scalability, security, and reliability.
Handle deployments, managing CI/CD pipelines for both containerized (Docker/Kubernetes) and serverless (AWS Lambda) applications.
Ensure effective backup, recovery, and disaster recovery strategies to minimize downtime.

Collaboration & Communication:

Collaborate with cross-functional teams to implement platform improvements.
Work independently and make swift decisions when managing service incidents outside core business hours.
Assist in platform security, ensuring adherence to best practices for cloud security and compliance.

Continuous Improvement:

Automate manual processes to reduce human error and improve efficiency.
Continuously enhance monitoring systems, ensuring robust early detection and resolution capabilities.
Identify potential performance bottlenecks and contribute to overall platform optimization.

This role is ideal for you if you possess:

Experience in Site Reliability Engineering, DevOps, or a similar field.
Strong experience with AWS services
Expertise in using monitoring tools (e.g. Prometheus, Grafana, CloudWatch) for real-time platform performance insights.
Hands-on experience with CI/CD pipeline management for deploying containerized (Docker) and serverless applications.
Proficiency in Linux-based operating systems and shell scripting.
Familiarity with Infrastructure as Code tools (Terraform, CloudFormation).
Experience with incident management, troubleshooting, and platform recovery in high-pressure environments.
Strong communication skills with a proven ability to work both independently and collaboratively across time zones.

⭐️ It’s a plus if you have:

Experience working in a global, distributed team providing off-hours support.
Knowledge of container orchestration tools.
Previous experience with SecOps and cloud security best practices.
Familiarity with scaling highly available systems in a fast-paced, growth-oriented environment.

Perlego

EBooks EdTech Education Publishing Subscription Service

0 applies

13 views

Other Jobs from Perlego

Software Engineer

Software Engineer (Python)

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

Perlego

Site Reliability Engineer (Remote)

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Other Jobs from Perlego

Software Engineer

Software Engineer (Python)

Similar Jobs

Junior Software Engineer - DevOps

Engineering - Native Mobile UI Developer – Vice President – Dallas

Lead Engineer - Devops

Senior Software QA Engineer Apple Services Engineering

Full Stack Software Engineer