11:11 Systems

Site Reliability Engineer

Remote US
Puppet AWS Kubernetes MySQL PostgreSQL CSS Ansible
This job is closed! Check out or
Description
 

Title: Site Reliability Engineer

11:11 is looking for a Site Reliability Engineer (SRE) to join our SRE team who will be responsible for site reliability, automation, and continued operation of large-scale global systems. This is a full-time position responsible for many large-scale critical systems within the organization, supporting internal and external users. The SRE team supports deployments both on-prem and in public cloud. 

Responsibilities

  • Design, build, deploy, and automate systems to improve the reliability, scalability, capacity, and efficiency of 11:11’s systems 
  • Implement automation to ensure repeatable and reliable rollouts of both infrastructure and code 
  • Following SRE methodology, analyze performance, usage patterns, capacity, and apply the findings to improve the underlying system(s) and processes 
  • Work with business owners to ensure that all new products and improvements follow these standards 
  • Implement metric collection, monitoring, and alerting following best practices with SLAs, SLOs, and SLIs 
  • Handle infrastructure related tasks such as cluster-wide upgrades, hardware maintenances 
  • Participate in on-call rotation 
  • Participate in daily team communication and travel to 11:11’s locations as needed 
  • Create and maintain documentation for all processes and systems 

 

Required Skills

The following skills represent the minimum requirements to be considered for this position: 

  • 2+ years recent experience as a linux system administrator or engineer 
  • Experience with automation and orchestration software, such as Ansible, Salt, and Puppet 
  • Experience working with regionally and/or globally distributed systems 
  • Detail oriented; able to focus on and resolve task-based work 
  • Strong documentation skills 
  • Excellent communicator (English, both written and verbal) and a positive attitude 

Preferred Skills

The following skills represent additional proficiencies preferred to be successful in this position:

  • Experience with large scale virtualization and storage clusters 
  • Experience in non-abstract large system design (NALSD) 
  • AWS experience (VPC, S3, EC2, EKS, IAM, and others) 
  • Ceph experience 
  • Prometheus and Grafana experience 
  • Kubernetes experience 
  • CI/CD Pipeline design 
  • Database experience, such as MySQL or PostgreSQL 
  • Experience with core infrastructure components such as NTP and DNS 
  • Experience with Open-Source software and interacting with Open-Source communities 
  • Robust networking knowledge and experience 
  • Basic software development experience 
  • Expert level experience with vendors such as VMware, Veeam, Zerto, etc. 


There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 250 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers