Chewy

Senior Site Reliability Engineer

Minneapolis, MN
Azure Terraform Java JavaScript React GCP Ansible Go Shell Docker AWS Python Kubernetes Microservices
This job is closed! Check out or
Description

Our Opportunity:

Chewy is seeking a Site Reliability Engineer III in Dallas, TX or Plantation, FL or Boston, MA or Minneapolis, MN. Chewy is THE go-to online shopping destination for all things pet and we are continuously striving to delight pet parents with a seamless experience across our platforms. The SRE team works with various teams across the organization to make their service more resilient against failures through applying common patterns and practices, and scale them up to keep up with the ever-increasing growth and demand. This includes facilitating resiliency testing, game day exercises and chaos testing to uncover risks and weaknesses before they lead to large scale production issues.

Do you enjoy working in a fast-paced environment, solving complex technical problems, and delivering innovative solutions? If you have a passion for solving complex problems unique to running large, highly scalable, resilient systems, we would love for you to join us! The role will have tremendous visibility in the technology & business organization of Chewy. This is a high-profile position that will have exposure across the entire business, influencing the vision and implementation of architecture, design and features of Chewy’s technical platform.

What You’ll Do:

  • Contribute to the development of our self-service chaos platform.
  • Enable engineering teams to make their services more reliable by identifying, creating, and deploying engineering practices, processes, and solutions.
  • Establish monitoring tools and management dashboards integrated into platforms with best practice notifications and response processes.
  • Define and document best practices and strategies regarding application deployment and infrastructure maintenance.
  • Educate teams on the implementation of new cloud-based initiatives, providing associated training as required.
  • Employ exceptional problem-solving skills, with the ability to see and solve issues before they affect business productivity.
  • Improve availability, reliability, and observability of Chewy services and reduce the burden of human toil with tooling and automation.

What You’ll Need:

  • 7+ years of experience in software engineering, SRE or performance engineering role.
  • Programming experience in one or more of Python, Go, Shell, Java, and JavaScript/React.
  • 5+ years of hands-on experience designing and developing scalable, high performing and fault-tolerant applications for large enterprises.
  • Expertise in developing executive friendly dashboards based on observable metrics in IT systems (KPIs, Incident Trends, MTTR, MTTD etc.).
  • Hands-on working experience with issue tracking tools and source control systems (GitHub).
  • Experience with Infrastructure tools, container technology (Docker), public cloud providers (AWS, Google Cloud, Azure), configuration and deployment management (Terraform, Ansible), continuous delivery infrastructure (e.g., Jenkins) and orchestration (Kubernetes, Fargate).
  • Excellent understanding of micro-services architecture, design patterns, and standard methodologies with an eye towards scale, automation, resiliency, and high availability.
  • Experienced with telemetry tooling and observability systems such as: Prometheus, Splunk, DataDog, Grafana.
  • Leverage automation to improve deployments and updates, speed up problem detection/resolution, and ensure safe and quick rollback when problems occur.
  • A Bachelor’s degree in Computer Science or related field or equivalent experience.
  • Position may require travel.

Bonus:

  • CDN & DNS experience is a plus.
  • Incident management and on-call experience.
  • Experience contributing to the architecture and design (architecture, design patterns, resiliency and scaling) of new and current systems.
  • Expertise in ITSM process & tools like JIRA, PagerDuty and experience with ServiceNow ITOM, ITSM Modules that focuses Incident, Problem and Change Management.

Chewy is committed to equal opportunity. We value and embrace diversity and inclusion of all Team Members. If you have a disability under the Americans with Disabilities Act or similar law, and you need an accommodation during the application process or to perform these job requirements, or if you need a religious accommodation, please contact CAAR@chewy.com.

 

If you have a question regarding your application, please contact HR@chewy.com.

 

To access Chewy's Customer Privacy Policy, please click here. To access Chewy's California CPRA Job Applicant Privacy Policy, please click here.

Chewy
Chewy
E-Commerce Pet Retail

0 applies

310 views

Other Jobs from Chewy

Software Engineer II

Plantation, FL

Data Engineer II

Bellevue, WA

Data Engineer II

Plantation, FL

Data Engineer II

Plantation, FL

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 216 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers