About us:
Working at Tech Holding isn't just a job, it's an opportunity to be a part of something bigger. We are a full-service consulting firm that was founded on the premise of delivering predictable outcomes and high-quality solutions to our clients. Our founders and team members have industry experience and have held senior positions in a wide variety of companies – from emerging startups to large Fortune 50 firms – and we have taken our combined experiences and developed a unique approach that is supported by the principles of deep expertise, integrity, transparency, and dependability.
The Role:
We are seeking a highly skilled and experienced Senior Site Reliability Engineer to join our growing team. You will play a critical role in ensuring the reliability, scalability, and performance of our critical infrastructure and applications. Beyond core SRE responsibilities, you will also serve as a key liaison across various teams, fostering collaboration and ensuring seamless operations.
Responsibilities:
Site Reliability Engineering:
- Proactively identify and mitigate potential issues impacting infrastructure and applications.
- Partner with development teams to implement best practices for building reliable and scalable systems.
- Stay up-to-date on the latest SRE trends and technologies.
Monitoring and Observability:
- Design, implement, and maintain robust monitoring solutions using tools like Prometheus and Grafana.
- Develop and configure alerts within tools like PagerDuty to ensure timely notification of potential issues.
- Analyze and troubleshoot issues using collected application and infrastructure metrics.
Incident Management:
- Lead incident response, ensuring timely resolution and minimizing downtime.
- Document and communicate incident details effectively to stakeholders.
- Conduct post-incident reviews to identify root causes and implement preventative measures.
Service Level Agreements (SLAs):
- Collaborate with product and engineering teams to define clear and measurable SLAs for our SaaS offerings.
- Establish Service Level Objectives (SLOs) for key metrics based on SLA requirements.
- Define Service Level Indicators (SLIs) to track progress towards achieving SLOs.
- Monitor SLO compliance and proactively identify potential SLA breaches.
Automation:
- Identify opportunities for automation to improve efficiency and reliability.
- Develop and implement automation scripts using tools like Python or Bash.
- Automate routine tasks and incident response workflows.
Cross-Team Collaboration:
- Act as a liaison between SRE, Product, Security, Application Engineering, and Customer Operations teams.
- Facilitate communication and information sharing across teams to ensure smooth operations.
- Work collaboratively to define and implement solutions that meet the needs of all stakeholders.
Mentorship and Knowledge Sharing:
- Mentor and collaborate with junior SRE engineers.
- Share knowledge and best practices within the team.
- Contribute to the development and documentation of internal SRE processes.
Required Skills:
- 5-8 years of experience as a Site Reliability Engineer (SRE) or related role.
- Experience with cloud platform GCP
- Proven experience with monitoring tools like Prometheus and Grafana.
- Strong understanding of incident management best practices.
- Experience with alerting tools like PagerDuty.
- Experience with scripting languages like Python or Bash for automation.
- Excellent communication and collaboration skills.
- Ability to work independently and as part of a team.
- Strong problem-solving and analytical skills.
- Passion for building reliable and scalable systems.
Nice to Have:
- Experience with container orchestration platforms like Kubernetes.
- Experience with chaos engineering principles.
- Experience with configuration management tools like Ansible or Chef.
What we offer:
- Remote Work Opportunities
- Flexible Work Hours
Jobs from our Partners
Senior Software Engineer, UX Engineering
Other Jobs from Tech Holding
Front End Engineer - Remote, MX
Backend Engineer - Remote, MX
Sr. Project Manager - Contract
Senior Software Engineer - Contract
Technical Project Manager - Contract
Similar Jobs
Staff Software Engineer - Platform (Finance Data) REMOTE
Cloud Software Engineer (US Remote)
Golang Engineer
Software Engineer I, Machine Learning Infrastructure
DevOps Engineer III (Remote)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 307 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee