This job is closed! Check out or

Description

Title: Site Reliability Engineer

11:11 is looking for a Site Reliability Engineer (SRE) to join our SRE team who will be responsible for site reliability, automation, and continued operation of large-scale global systems. This is a full-time position responsible for many large-scale critical systems within the organization, supporting internal and external users. The SRE team supports deployments both on-prem and in public cloud.

Responsibilities

Design, build, deploy, and automate systems to improve the reliability, scalability, capacity, and efficiency of 11:11’s systems
Implement automation to ensure repeatable and reliable rollouts of both infrastructure and code
Following SRE methodology, analyze performance, usage patterns, capacity, and apply the findings to improve the underlying system(s) and processes
Work with business owners to ensure that all new products and improvements follow these standards
Implement metric collection, monitoring, and alerting following best practices with SLAs, SLOs, and SLIs
Handle infrastructure related tasks such as cluster-wide upgrades, hardware maintenances
Participate in on-call rotation
Participate in daily team communication and travel to 11:11’s locations as needed
Create and maintain documentation for all processes and systems

Required Skills

The following skills represent the minimum requirements to be considered for this position:

2+ years recent experience as a linux system administrator or engineer
Experience with automation and orchestration software, such as Ansible, Salt, and Puppet
Experience working with regionally and/or globally distributed systems
Detail oriented; able to focus on and resolve task-based work
Strong documentation skills
Excellent communicator (English, both written and verbal) and a positive attitude

Preferred Skills

The following skills represent additional proficiencies preferred to be successful in this position:

Experience with large scale virtualization and storage clusters
Experience in non-abstract large system design (NALSD)
AWS experience (VPC, S3, EC2, EKS, IAM, and others)
Ceph experience
Prometheus and Grafana experience
Kubernetes experience
CI/CD Pipeline design
Database experience, such as MySQL or PostgreSQL
Experience with core infrastructure components such as NTP and DNS
Experience with Open-Source software and interacting with Open-Source communities
Robust networking knowledge and experience
Basic software development experience
Expert level experience with vendors such as VMware, Veeam, Zerto, etc.

11:11 Systems

Information Services

0 applies

32 views

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 251 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

11:11 Systems

Site Reliability Engineer

Ugh.. sorry 😔 This job is closed.

Check out similar jobs below 😊

Jobs from our Partners

IT Infrastructure Engineer

SharePoint Systems Engineer

Software Reverse Engineer, Senior

Back End Developer

Embedded Controls Engineer and Developer

Data Engineering Domain Architect Dallas or Detroit

Other Jobs from 11:11 Systems

Build Systems Engineer

Scientific Computing Engineer (Machine Learning, AI, Data Systems)

Lead Systems Engineer -High Speed Interconnect IPs.

Tensilica Embedded Systems Programming Intern (Summer 2024)

Sr Staff Systems Engineer (Server Farm)

R&D - Data Center Systems Architect - Senior

Similar Jobs

Engineering Manager, City OS - CI/CD, DevOps

HPC Systems Engineer

Site Reliability Engineer

Principal Software Engineer, Cloud Platform - Remote

Senior Site Reliability Engineer, APAC

Senior Site Reliability Engineer

Wall of love from fellow engineers