Description

Company Overview

With 80,000 customers across 150 countries, UKG is the largest U.S.-based private software company in the world. And we’re only getting started. Ready to bring your bold ideas and collaborative mindset to an organization that still has so much more to build and achieve? Read on.

At UKG, you get more than just a job. You get to work with purpose. Our team of U Krewers are on a mission to inspire every organization to become a great place to work through our award-winning HR technology built for all.

Here, we know that you’re more than your work. That’s why our benefits help you thrive personally and professionally, from wellness programs and tuition reimbursement to U Choose — a customizable expense reimbursement program that can be used for more than 200+ needs that best suit you and your family, from student loan repayment, to childcare, to pet insurance. Our inclusive culture, active and engaged employee resource groups, and caring leaders value every voice and support you in doing the best work of your career. If you’re passionate about our purpose — people —then we can’t wait to support whatever gives you purpose. We’re united by purpose, inspired by you.

Site Reliability Engineers at UKG are team members that have a breadth of knowledge encompassing all aspects of service delivery. They develop software solutions to enhance, harden and support our service delivery processes. This can include building and managing CI/CD deployment pipelines, automated testing, capacity planning, performance analysis, monitoring, alerting, chaos engineering and auto remediation.
Site Reliability Engineers must have a passion for learning and evolving with current technology trends. They strive to innovate and are relentless in their pursuit of a flawless customer experience. They have an “automate everything” mindset, helping us bring value to our customers by deploying services with incredible speed, consistency and availability.
Primary/Essential Duties and Key Responsibilities:

• Proficient in Splunk/ELK, and Datadog.
• Experience with observability tools such as Prometheus/InfluxDB, and Grafana.
• Possesses strong knowledge of at least one scripting language such as Python, Bash, Powershell or any other relevant languages.
• Design, develop, and maintain observability tools and infrastructure.
• Collaborate with other teams to ensure observability best practices are followed.
• Develop and maintain dashboards and alerts for monitoring system health.
• Troubleshoot and resolve issues related to observability tools and infrastructure.
• Engage in and improve the lifecycle of services from conception to EOL, including: system design consulting, and capacity planning
• Define and implement standards and best practices related to: System Architecture, Service delivery, metrics and the automation of operational tasks
• Support services, product & engineering teams by providing common tooling and frameworks to deliver increased availability and improved incident response.
• Improve system performance, application delivery and efficiency through automation, process refinement, postmortem reviews, and in-depth configuration analysis
• Collaborate closely with engineering professionals within the organization to deliver reliable services
• Identify and eliminate operational toil by treating operational challenges as a software engineering problem
• Actively participate in incident response, including on-call responsibilities
• Partner with stakeholders to influence and help drive the best possible technical and business outcomes
• Guide junior team members and serve as a champion for Site Reliability Engineering

• Engineering degree, or a related technical discipline, and 10+years of experience in SRE.
• Experience coding in higher-level languages (e.g., Python, Javascript, C++, or Java)
• Knowledge of Cloud based applications & Containerization Technologies
• Demonstrated understanding of best practices in metric generation and collection, log aggregation pipelines, time-series databases, and distributed tracing
• Ability to analyze current technology utilized and engineering practices within the company and develop steps and processes to improve and expand upon them
• Working experience with industry standards like Terraform, Ansible.
• (Experience, Education, Certification, License and Training)
• Must have hands-on experience working within Engineering or Cloud.
• Experience with public cloud platforms (e.g. GCP, AWS, Azure)
• Experience in configuration and maintenance of applications & systems
• infrastructure. Experience with distributed system design and architecture
• Experience building and managing CI/CD Pipelines

UKG

Bookkeeping and Payroll Human Resources Software Bookkeeping and Payroll Human Resources Software Bookkeeping and Payroll Human Resources Software

0 applies

1 views

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say

UKG

Sr Principal Site Reliability Engineer

Other Jobs from UKG

Manager, Software Engineering

Mgr. Software Engineering

Software Engineer

Software Engineer

Software Engineer

Similar Jobs

Senior HPC AI Cluster Engineer

Senior Solutions Architect, Cloud Infrastructure and DevOps - NVIS

Senior Solutions Architect, Cloud Infrastructure and DevOps - NVIS

Senior Solutions Architect, Cloud Infrastructure and DevOps - NVIS

Staff Software Engineer - End-User Compute Platform

Senior Engineer, Server Administration - RAPIDS