TriNet is a leading provider of comprehensive human resources solutions for small to midsize businesses (SMBs). We enhance business productivity by enabling our clients to outsource their HR function to one strategic partner and allowing them to focus on operating and growing their core businesses. Our full-service HR solutions include features such as payroll processing, human capital consulting, employment law compliance and employee benefits, including health insurance, retirement plans and workers’ compensation insurance.
TriNet has a nationwide presence and an experienced executive team. Our stock is publicly traded on the NYSE under the ticker symbol TNET. If you’re passionate about innovation and making an impact on the large SMB market, come join us as we power our clients’ business success with extraordinary HR.
Don't meet every single requirement? Studies have shown that women and people of color are less likely to apply to jobs unless they meet every single requirement. At TriNet, we are dedicated to building a diverse, inclusive and authentic workplace, so if you're excited about this role but your past experience doesn't align perfectly with every single qualification in the job description, we encourage you to apply anyways. You may just be the right candidate for this or other roles.
JOB SUMMARY
The role of a Senior Observability Engineer is to design, implement, and maintain comprehensive observability solutions for complex systems and applications. This position requires a deep understanding of monitoring and observability practices, as well as expertise in using various tools and technologies to collect and analyze performance, logging, and metrics data.
Essential Duties/Responsibilities
- Monitoring Setup and Configuration: Set up and configure the monitoring tools to collect data from various systems, applications, and network components. This involves defining monitoring metrics, configuring data collection agents or agents, and ensuring proper connectivity and access.
- Alert Management: Monitor alerts generated by the tools and perform triage to identify critical issues. Analyze alert patterns, fine-tune alert thresholds, and configure alert escalation workflows to ensure timely response and resolution.
- Performance Analysis and Troubleshooting: Utilize the tools' features and functionalities to analyze performance metrics, logs, and traces. Conduct investigations and root cause analysis to troubleshoot and resolve performance issues, identifying bottlenecks and areas for optimization.
- Incident Response: Collaborate with cross-functional teams to respond to and resolve incidents in a timely manner. Engage in incident management processes, including incident triage, communication, and coordination with relevant stakeholders, and participate in post-incident reviews to identify areas for improvement.
- Dashboard and Visualization: Create and maintain dashboards and visualizations using tools like Grafana, providing a consolidated view of system health, performance, and key metrics. Customize dashboards to meet specific business and operational requirements and share them with relevant teams and stakeholders.
- Capacity Planning and Scalability: Monitor resource utilization and performance trends to forecast capacity requirements. Collaborate with capacity planning teams to plan and provision resources based on anticipated growth and workload patterns, ensuring scalability and optimal performance.
- Tool Administration and Maintenance: Perform routine administration tasks for the observability tools, such as user management, access control, and system upgrades or patching. Monitor the health and availability of the tools themselves, ensuring their reliability and functionality.
- Documentation and Knowledge Sharing: Document monitoring configurations, troubleshooting procedures, and best practices for future reference. Contribute to internal knowledge bases and collaborate with the team to share insights and lessons learned.
- Tool Integration and Automation: Integrate observability tools with other systems and workflows, such as ticketing systems, incident management platforms, and automation frameworks. Automate monitoring configurations, data collection, and reporting processes to improve efficiency and reduce manual effort.
- Continuous Improvement and Research: Stay updated with the latest developments in observability practices and technologies. Research and evaluate new tools and techniques that could enhance the monitoring and observability capabilities of the organization. Continuously improve existing monitoring setups, workflows, and processes to align with industry best practices.
- Performs other duties as assigned
- Complies with all policies and standards
QUALIFICATIONS
Education
- Bachelor's Degree in computer science or other highly technical, scientific subject area preferred
Work Experience
- Typically 5+ years experience with systems engineering and/or information technology
Knowledge, Skills and Abilities
- Demonstrate knowledge and experience administering application, cloud infrastructure monitoring.
- Hands-on experience on Prometheus & Grafana
- Hands-on experience on Elasticsearch (AWS OpenSearch) & Oracle Logging Analytics or similar tools like Datadog, Splunk, Sumo Logic
- Hands-on experience on APM tool AppDynamics or similar tools like Dynatrace, New Relic
- Scripting Language experience (Python preferred)
- Strong understanding of web services and swagger is a plus.
- Experience with CI/CD pipelines
- Attitude to thrive in a fun, fast-paced environment.
- Ability to excel at problem solving, adapt easily to change, and contribute effectively both individually and as part of cross-functional teams.
- Proficiency in Infrastructure as Code (IaC), particularly CDK and Terraform, is highly desirable.
- Passion for DevOps, Application/API monitoring, automation, and reliability
Work Environment:
- Work in clean, pleasant, and comfortable home or office setting. The work environment characteristics described here are representative of those an employee encounters while performing the essential functions of this job. Reasonable accommodations may be made to enable persons with disabilities to perform the essential functions.
- Position may be considered remote and require reliable and consistent internet service.
Travel Requirements
Minimal
The salary range for this role is $76,000 to $182,400. The candidate’s final salary offer will be based on the candidate’s skills, education, work location and experience.
A candidate’s compensation may also include bonuses consistent with TriNet’s corporate bonus plan.
Additionally, subject to applicable eligibility requirements, TriNet offers permanent full-time employees a variety of benefits including medical, dental, and vision plans, life and disability insurance, a 401(K) savings plan, an employee stock purchase plan, eleven (11) Company observed holidays, PTO and a comprehensive leave program. Please click the following link for detailed information about our benefits offerings: https://www.trinet.com/documents/blt5b61a1040aae1904
Please Note: TriNet reserves the right to change or modify job duties and assignments at any time. The above job description is not all encompassing. Position functions and qualifications may vary depending on business necessity.
TriNet is an Equal Opportunity Employer and does not discriminate against applicants based on race, religion, color, disability, medical condition, legally protected genetic information, national origin, gender, sexual orientation, marital status, gender identity or expression, sex (including pregnancy, childbirth or related medical conditions), age, veteran status or other legally protected characteristics. Any applicant with a mental or physical disability who requires an accommodation during the application process should contact recruiting@trinet.com to request such an accommodation.
0 applies
6 views
Other Jobs from TriNet
Staff Software Engineer
Manager, Security Engineering and Automation
Senior Data Analyst-Retention
Lead Data Product Manager
Lead Data Analyst
Similar Jobs
Senior Software Engineer
Senior Systems Operations Engineer
Senior Specialty Systems Operations Engineer
Senior Software Engineer
Senior Information Security Engineer - Data Warehouse Analytics
Senior Software Engineer (Global Branch Expansion)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say