Title: Site Reliability Engineer
11:11 is looking for a Site Reliability Engineer (SRE) to join our SRE team who will be responsible for site reliability, automation, and continued operation of large-scale global systems. This is a full-time position responsible for many large-scale critical systems within the organization, supporting internal and external users. The SRE team supports deployments both on-prem and in public cloud.
Responsibilities
- Design, build, deploy, and automate systems to improve the reliability, scalability, capacity, and efficiency of 11:11’s systems
- Implement automation to ensure repeatable and reliable rollouts of both infrastructure and code
- Following SRE methodology, analyze performance, usage patterns, capacity, and apply the findings to improve the underlying system(s) and processes
- Work with business owners to ensure that all new products and improvements follow these standards
- Implement metric collection, monitoring, and alerting following best practices with SLAs, SLOs, and SLIs
- Handle infrastructure related tasks such as cluster-wide upgrades, hardware maintenances
- Participate in on-call rotation
- Participate in daily team communication and travel to 11:11’s locations as needed
- Create and maintain documentation for all processes and systems
Required Skills
The following skills represent the minimum requirements to be considered for this position:
- 2+ years recent experience as a linux system administrator or engineer
- Experience with automation and orchestration software, such as Ansible, Salt, and Puppet
- Experience working with regionally and/or globally distributed systems
- Detail oriented; able to focus on and resolve task-based work
- Strong documentation skills
- Excellent communicator (English, both written and verbal) and a positive attitude
Preferred Skills
The following skills represent additional proficiencies preferred to be successful in this position:
- Experience with large scale virtualization and storage clusters
- Experience in non-abstract large system design (NALSD)
- AWS experience (VPC, S3, EC2, EKS, IAM, and others)
- Ceph experience
- Prometheus and Grafana experience
- Kubernetes experience
- CI/CD Pipeline design
- Database experience, such as MySQL or PostgreSQL
- Experience with core infrastructure components such as NTP and DNS
- Experience with Open-Source software and interacting with Open-Source communities
- Robust networking knowledge and experience
- Basic software development experience
- Expert level experience with vendors such as VMware, Veeam, Zerto, etc.
Jobs from our Partners
IT Infrastructure Engineer
SharePoint Systems Engineer
Back End Developer
Other Jobs from 11:11 Systems
Build Systems Engineer
Lead Systems Engineer -High Speed Interconnect IPs.
Tensilica Embedded Systems Programming Intern (Summer 2024)
Sr Staff Systems Engineer (Server Farm)
R&D - Data Center Systems Architect - Senior
Similar Jobs
Engineering Manager, City OS - CI/CD, DevOps
HPC Systems Engineer
Site Reliability Engineer
Principal Software Engineer, Cloud Platform - Remote
Senior Site Reliability Engineer, APAC
Senior Site Reliability Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
50,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 251 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee