Equifax

Site Reliability Engineer

Toronto, Ontario Canada
PostgreSQL GCP Kubernetes Python Shell MongoDB AWS Groovy Docker Terraform
Search for More Jobs Talk to a recruiter now 💪
Description

Synopsis of the role 

Site Reliability Engineering (SRE) combines software and systems engineering to create scalable and highly reliable software systems. SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services.

What experience you need

  • 8-10 years of experience doing hands-on DevOps engineering, Reliability engineering and production support for large scale IT systems on cloud platforms like GCP and AWS

  • A good level of hands on experience in Kubernetes (GKE, EKS) 

  • Strong scripting skills (Python, Shell, Groovy) 

  • Good command over Linux, Networking on Cloud and Docker

  • Ability to understand and code pipelines for CI/CD automation using Jenkins

  • Capable of coding infrastructure using terraform.

  • Exposure to maintaining databases like MongoDB, Postgres. 

What you’ll do

  • Design, architect and develop cloud native solutions using services like GKE, Cloud Functions, CloudSQL, BigQuery, Pub/Sub, Composer, Dataflow etc on Google cloud platform

  • Build and own infrastructure through Terraform code and maintain a high quality code base

  • Work closely with development teams to remove repetitive processes using Automation (Jenkins, Python, Groovy, gcloud)

  • Troubleshoot production incidents using tools like DataDog, Google Cloud Operations suite, Grafana, ChaosSearch

  • Participate in the SRE team’s on-call rotations, respond to incidents and provide expert support in resolving customer impacting production issues

  • Plan and Implement Disaster Recovery for the systems and conduct regular DR tests to ensure business continuity during the event of a disaster

  • Actively contribute to the SRE operational artifacts

    • Engineering documentation

    • Standard operating procedures

  • Perform cloud cost optimization on the resources owned by SRE

  • Proactively keep up with all the security scans and reports to maintain a secure system and perform regular patching of all cloud resources

What could set you apart

  • A good exposure to security patching of resources on google cloud
  • Ability to document engineering solutions and share the information across the team

  • Ability to help with developing standard operating procedures for SRE operations within the company

  • Willingness to go through official product documentations to build academically correct and secure systems

  • Exposure to Vertex AI on google cloud is a plus

  • Exposure to maintaining databases like MongoDB, Postgres

  • Availability to work extended hours during production incidents and production changes.

Primary Location:

CAN-Toronto-5700 Yonge

Function:

Function - Tech Engineering and Service Ops

Schedule:

Full time
Equifax
Equifax
Analytics Consulting Database

0 applies

27 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 307 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers