Description
WHO WE ARE:
EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.
We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.
POSITION OVERVIEW:
EOS IT Solutions is seeking a technically proficient Service Reliability Engineer to join our managed services infrastructure engineering team, supporting advanced collaboration technologies in a fast paced and industry leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, networking, and collaboration technologies, and a passion for continuous learning.
WHAT YOU'LL DO:
Troubleshoot and resolve technical issues related to collaboration technologies, networking, and infrastructure, utilizing advanced diagnostic tools and techniques
Perform routine maintenance, upgrades, and patching on collaboration systems, network infrastructure, and associated hardware and software components
Contribute to the development and implementation of automation solutions using scripting languages like Bash, Python, and industry-standard frameworks to streamline infrastructure management tasks
Monitor system performance, perform capacity planning, and optimize infrastructure for maximum efficiency and reliability
Ensure the security and compliance of collaboration systems by implementing and maintaining industry best practices and standards
Work closely with the Service Delivery Manager and other team members to provide timely technical support to clients, ensuring high-quality service and adherence to SLAs
Participate in cross-functional projects and collaborate with other teams, such as network, security, and cloud teams, to ensure seamless integration of collaboration solutions
Maintain up-to-date documentation of technical processes, systems, configurations, and network topologies
Efficiently handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
Provide On-Call support as an escalation point for VC infrastructure and network troubleshooting
Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with the other Production Engineers to provide the right tooling to measure the reliability of our systems
Collaborate with development and operations teams to ensure availability and reliability of the application and infrastructure
Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
Perform testing and quality assurance around software and hardware used in our environment
WHAT YOU NEED TO SUCCEED:
At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
Linux expertise
Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
Experience in solving and resolving issues in VMware from both an operating system and application perspective.
Understanding of ITIL processes, service management principles, and IT service delivery best practices
Building and operating container orchestrating systems like VMware.
Designing, building and maintaining infrastructure with a cloud provider such as AWS.
Automation advocate - prior history of removing operational toil via software.
Self motivated, inquisitive and always looking to learn more.
Familiarity with scripting languages like Bash, Python, or similar, and experience with REST APIs
Experience with systems automation tools like Chef, Ansible, Terraform, or similar.
ADDITIONAL REQUIREMENTS:
Strong foundational knowledge in networking protocols, infrastructure, and troubleshooting techniques, including TCP/IP, DNS, DHCP, VLANs, and routing protocols
Disaster recovery and capacity planning.
Strong communication and interpersonal skills, with the ability to work effectively in a team-oriented environment
Self-motivated and eager to learn new technologies, tools, and methodologies
EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor.
EOS IT Solutions is a Global Technology and Logistics company, providing Collaboration and Business IT Support services to some of the world's largest industry leaders, delivering forward-thinking solutions based on multi-domain architecture. Customer satisfaction and commitment to superior quality of service are our top business priorities, along with investing in and supporting our partners and employees.
We are a true International IT provider and are proud to deliver our services through global simplicity with trusted transparency.
POSITION OVERVIEW:
EOS IT Solutions is seeking a technically proficient Service Reliability Engineer to join our managed services infrastructure engineering team, supporting advanced collaboration technologies in a fast paced and industry leading environment. The ideal candidate is a highly motivated technical enthusiast with a strong foundation in IT, networking, and collaboration technologies, and a passion for continuous learning.
WHAT YOU'LL DO:
Troubleshoot and resolve technical issues related to collaboration technologies, networking, and infrastructure, utilizing advanced diagnostic tools and techniques
Perform routine maintenance, upgrades, and patching on collaboration systems, network infrastructure, and associated hardware and software components
Contribute to the development and implementation of automation solutions using scripting languages like Bash, Python, and industry-standard frameworks to streamline infrastructure management tasks
Monitor system performance, perform capacity planning, and optimize infrastructure for maximum efficiency and reliability
Ensure the security and compliance of collaboration systems by implementing and maintaining industry best practices and standards
Work closely with the Service Delivery Manager and other team members to provide timely technical support to clients, ensuring high-quality service and adherence to SLAs
Participate in cross-functional projects and collaborate with other teams, such as network, security, and cloud teams, to ensure seamless integration of collaboration solutions
Maintain up-to-date documentation of technical processes, systems, configurations, and network topologies
Efficiently handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
Provide On-Call support as an escalation point for VC infrastructure and network troubleshooting
Build end-to-end monitoring infrastructure (Logging, Metrics, Tracing) and work closely with the other Production Engineers to provide the right tooling to measure the reliability of our systems
Collaborate with development and operations teams to ensure availability and reliability of the application and infrastructure
Work closely with software engineers and QAs to ensure the system is responding properly to non-functional requirements such as performance, security, and availability
Perform testing and quality assurance around software and hardware used in our environment
WHAT YOU NEED TO SUCCEED:
At least 3 years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role.
Linux expertise
Support of internet-facing production services and distributed systems via deployments, onCall and Incident Management.
Proficiency in implementing and coordinating telemetry using monitoring and observability tools like Splunk, Grafana, and Prometheus, or similar.
Experience in solving and resolving issues in VMware from both an operating system and application perspective.
Understanding of ITIL processes, service management principles, and IT service delivery best practices
Building and operating container orchestrating systems like VMware.
Designing, building and maintaining infrastructure with a cloud provider such as AWS.
Automation advocate - prior history of removing operational toil via software.
Self motivated, inquisitive and always looking to learn more.
Familiarity with scripting languages like Bash, Python, or similar, and experience with REST APIs
Experience with systems automation tools like Chef, Ansible, Terraform, or similar.
ADDITIONAL REQUIREMENTS:
Strong foundational knowledge in networking protocols, infrastructure, and troubleshooting techniques, including TCP/IP, DNS, DHCP, VLANs, and routing protocols
Disaster recovery and capacity planning.
Strong communication and interpersonal skills, with the ability to work effectively in a team-oriented environment
Self-motivated and eager to learn new technologies, tools, and methodologies
EOS is committed to creating a diverse and inclusive work environment and is proud to be an equal opportunity employer. We invite you to consider opportunities at EOS regardless of your gender; gender identity; gender reassignment; age; religious or similar philosophical belief; race; national origin; political opinion; sexual orientation; disability; marital or civil partnership status or other non-merit factor.
Jobs from our Partners
Software Architect - Remote
Tucson, AZ
US
Lead Salesforce Developer
Hartford, CT
US
Full Stack Developer
St. Louis, MO
US
Similar Jobs
Senior Site Reliability Developer
Montreal, Canada
Quebec
Senior Network Engineer
San Diego, CA
US
Sr. Software Engineer - IaaS (Cluster Management) (REMOTE)
Remote
Austin, TX
Senior Software Engineering Manager, IaaS (Cluster Management) (REMOTE)
Remote
San Francisco, CA
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
50,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 264 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee