EOS

Service Reliability Engineer

Austin, TX US
USD 100k - 115k
Chef Ansible Terraform Bash Python
This job is closed! Check out or
Description
WHO WE ARE:

EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.

We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.

POSITION OVERVIEW:

EOS IT Solutions is seeking a technically proficient Service Reliability Engineer to join our managed services infrastructure engineering team, supporting advanced collaboration technologies in a fast paced and industry leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, networking, and collaboration technologies, and a passion for continuous learning.

WHAT YOU'LL DO:

Troubleshoot and resolve technical issues related to collaboration technologies, networking, and infrastructure, utilizing advanced diagnostic tools and techniques
Perform routine maintenance, upgrades, and patching on collaboration systems, network infrastructure, and associated hardware and software components
Contribute to the development and implementation of automation solutions using scripting languages like Bash, Python, and industry-standard frameworks to streamline infrastructure management tasks
Monitor system performance, perform capacity planning, and optimize infrastructure for maximum efficiency and reliability
Ensure the security and compliance of collaboration systems by implementing and maintaining industry best practices and standards
Work closely with the Service Delivery Manager and other team members to provide timely technical support to clients, ensuring high-quality service and adherence to SLAs
Participate in cross-functional projects and collaborate with other teams, such as network, security, and cloud teams, to ensure seamless integration of collaboration solutions
Maintain up-to-date documentation of technical processes, systems, configurations, and network topologies
Efficiently handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
Provide On-Call support as an escalation point for VC infrastructure and network troubleshooting
Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with the other Production Engineers to provide the right tooling to measure the reliability of our systems
Collaborate with development and operations teams to ensure availability and reliability of the application and infrastructure
Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
Perform testing and quality assurance around software and hardware used in our environment

WHAT YOU NEED TO SUCCEED:

At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
Linux expertise
Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
Experience in solving and resolving issues in VMware from both an operating system and application perspective.
Understanding of ITIL processes, service management principles, and IT service delivery best practices
Building and operating container orchestrating systems like VMware.
Designing, building and maintaining infrastructure with a cloud provider such as AWS.
Automation advocate - prior history of removing operational toil via software.
Self motivated, inquisitive and always looking to learn more.
Familiarity with scripting languages like Bash, Python, or similar, and experience with REST APIs
Experience with systems automation tools like Chef, Ansible, Terraform, or similar.

ADDITIONAL REQUIREMENTS:

Strong foundational knowledge in networking protocols, infrastructure, and troubleshooting techniques, including TCP/IP, DNS, DHCP, VLANs, and routing protocols
Disaster recovery and capacity planning.
Strong communication and interpersonal skills, with the ability to work effectively in a team-oriented environment
Self-motivated and eager to learn new technologies, tools, and methodologies

EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 264 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers