Synopsis of the role
Site Reliability Engineering (SRE) combines software and systems engineering to create scalable and highly reliable software systems. SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services.
What experience you need
8-10 years of experience doing hands-on DevOps engineering, Reliability engineering and production support for large scale IT systems on cloud platforms like GCP and AWS
A good level of hands on experience in Kubernetes (GKE, EKS)
Strong scripting skills (Python, Shell, Groovy)
Good command over Linux, Networking on Cloud and Docker
Ability to understand and code pipelines for CI/CD automation using Jenkins
Capable of coding infrastructure using terraform.
Exposure to maintaining databases like MongoDB, Postgres.
What you’ll do
Design, architect and develop cloud native solutions using services like GKE, Cloud Functions, CloudSQL, BigQuery, Pub/Sub, Composer, Dataflow etc on Google cloud platform
Build and own infrastructure through Terraform code and maintain a high quality code base
Work closely with development teams to remove repetitive processes using Automation (Jenkins, Python, Groovy, gcloud)
Troubleshoot production incidents using tools like DataDog, Google Cloud Operations suite, Grafana, ChaosSearch
Participate in the SRE team’s on-call rotations, respond to incidents and provide expert support in resolving customer impacting production issues
Plan and Implement Disaster Recovery for the systems and conduct regular DR tests to ensure business continuity during the event of a disaster
Actively contribute to the SRE operational artifacts
Engineering documentation
Standard operating procedures
Perform cloud cost optimization on the resources owned by SRE
Proactively keep up with all the security scans and reports to maintain a secure system and perform regular patching of all cloud resources
What could set you apart
- A good exposure to security patching of resources on google cloud
Ability to document engineering solutions and share the information across the team
Ability to help with developing standard operating procedures for SRE operations within the company
Willingness to go through official product documentations to build academically correct and secure systems
Exposure to Vertex AI on google cloud is a plus
Exposure to maintaining databases like MongoDB, Postgres
Availability to work extended hours during production incidents and production changes.
Primary Location:
CAN-Toronto-5700 YongeFunction:
Function - Tech Engineering and Service OpsSchedule:
Full timeJobs from our Partners
Senior Java Software Developer
Python Developer
Senior Service Reliability Engineer
System Integration and Test Engineer - 2nd Shift
Sr Software QA Engineer
Software Solution Architect and Test Engineer, Senior
Other Jobs from Equifax
Machine Learning Engineer
Director, Software Engineering
Java Developer - Intermediate
Linux Systems Engineer
Big Data Cloud Engineer
Big Data Cloud Architect (GCP)
Similar Jobs
Lead DevOps Engineer
Senior Software Engineer (DevOps) Real Time Payments
Automation Platform Engineer
Senior Infrastructure Engineer - Full Stack
Lead Developer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 307 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee