Axelerant

Senior Software Engineer, Site Reliability, Digital (Remote)

Remote India
Description

Senior Software Engineer - Site Reliability

Department: Digital

Employment Type: Full Time

Location: India Remote, Remote


We are seeking an accomplished Senior Site Reliability Engineer (SRE) to lead the design, implementation, and evolution of highly available, scalable, and resilient systems across our multi-cloud infrastructure. In this senior role, you will drive architectural decisions, establish reliability standards, and mentor teams while ensuring operational excellence across complex distributed systems. You will partner with engineering leadership, development teams, and product stakeholders to shape infrastructure strategy, implement sophisticated automation, and champion a culture of reliability engineering.

At Axelerant, we are committed to fostering an environment where innovation and operational excellence thrive. As a Senior SRE, you'll tackle sophisticated, large-scale challenges using cutting-edge technologies across AWS and Azure platforms. You will lead critical initiatives that impact system reliability at scale, architect solutions for complex infrastructure problems, and guide teams in adopting industry-leading practices that drive meaningful improvements across our entire technology ecosystem.

Key Responsibilities

  • Architect and implement highly reliable, scalable, and cost-effective infrastructure solutions for mission-critical applications across multi-cloud environments (AWS and Azure).
  • Lead the definition and refinement of service level objectives (SLOs), service level indicators (SLIs), and error budgets, establishing reliability standards across the organization.
  • Design and implement sophisticated Infrastructure as Code (IaC) solutions using Terraform, Ansible, and Azure Resource Manager (ARM) templates or Bicep.
  • Drive automation strategies to eliminate toil, improve operational efficiency, and enable self-service capabilities for development teams.
  • Lead incident response efforts, conduct thorough post-incident reviews, and implement systemic improvements to prevent recurrence.
  • Champion cloud-native architectures and modern reliability practices, serving as a technical advisor for infrastructure and platform decisions.
  • Mentor junior SREs and engineers, fostering a culture of reliability, observability, and continuous improvement.
  • Participate in and help optimize the on-call rotation, ensuring sustainable practices and effective escalation procedures.
  • Establish and maintain comprehensive documentation standards, runbooks, and knowledge repositories that enable team autonomy and effective incident response.
  • Design and implement advanced monitoring, logging, and alerting strategies using observability platforms to enable proactive issue detection and resolution.
  • Lead container orchestration initiatives using Kubernetes (AKS, EKS) and implement sophisticated deployment strategies including blue-green, canary, and progressive delivery patterns.
  • Ensure security, compliance, and governance standards are embedded throughout the infrastructure lifecycle, implementing security-as-code practices.
  • Drive capacity planning, performance optimization, and cost management initiatives across cloud platforms.
  • Collaborate with architecture and security teams to establish platform standards, reference architectures, and best practices.

Skills, Knowledge and Expertise

  • 5+ years of proven experience as a Site Reliability Engineer or similar role, with demonstrated expertise in designing, implementing, and operating large-scale, distributed systems.
  • Deep expertise in Infrastructure as Code (IaC) with Terraform and Ansible, including module development, state management, and multi-environment orchestration.
  • Extensive hands-on experience with both AWS and Azure cloud platforms, including advanced services, networking, and security features in both environments.
  • Expert-level knowledge of container orchestration with Kubernetes, including architecture, custom resource definitions (CRDs), operators, service mesh implementations, and production-scale cluster management.
  • Advanced proficiency in Linux system administration, performance tuning, and troubleshooting complex system-level issues.
  • Proven experience implementing GitOps workflows using ArgoCD, Flux, or similar tools, including advanced deployment patterns and progressive delivery.
  • Deep understanding of observability principles and hands-on experience with tools such as Prometheus, Grafana, Datadog, Azure Monitor, or the ELK stack.
  • Expert knowledge of networking concepts, including load balancing, CDNs, DNS, VPNs, service mesh architectures, and distributed systems communication patterns.
  • Strong programming and scripting capabilities in Python, Bash, Go, or PowerShell, with the ability to develop custom tooling and automation frameworks.
  • Extensive experience designing and optimizing CI/CD pipelines using Jenkins, GitLab CI, Azure DevOps, GitHub Actions, or CircleCI.
  • Demonstrated ability to lead incident response, conduct root cause analysis, and drive systemic reliability improvements.
  • Excellent communication and leadership skills with proven ability to influence technical decisions and collaborate with stakeholders at all levels.
  • Current certification in AWS (Solutions Architect Associate/Professional or equivalent) and Azure (Azure Administrator or Azure Solutions Architect), with practical experience managing production workloads on both platforms.
Good To Have
  • Experience with hybrid and multi-cloud networking strategies, including ExpressRoute, Direct Connect, and cloud interconnects.
  • Knowledge of serverless architectures on AWS (Lambda) and Azure (Functions, Logic Apps) and their operational considerations.
  • Proven experience with disaster recovery planning, business continuity, and implementing multi-region active-active architectures.
  • Understanding of machine learning operations (MLOps), data pipeline orchestration, and supporting ML workloads in production.
  • Experience with service mesh technologies such as Istio, Linkerd, or Consul.
  • Familiarity with chaos engineering principles and tools like Chaos Monkey or Gremlin.
  • Experience with configuration management at scale and policy-as-code tools like Open Policy Agent (OPA).
  • Knowledge of FinOps principles and cloud cost optimization strategies.

What Would Success Look Like For You?

Success in this role means establishing and maintaining industry-leading reliability standards, consistently achieving or exceeding SLOs, and driving strategic initiatives that significantly enhance system resilience and operational maturity. You will be recognized for your technical leadership, ability to architect solutions that prevent classes of incidents, your impact on team capability through mentorship, and your contribution to establishing a robust reliability engineering culture across the organization.

Your Work's Impact:

Your contributions will fundamentally shape the reliability, performance, and scalability of our platform, directly enabling engineering teams to innovate with confidence and deliver exceptional value to our customers. Your architectural decisions and reliability practices will influence system design across the organization, setting standards that ensure operational excellence at scale.

Why Work At Axelerant?

We're a people-centric company, driven by our core values: Openness, Enthusiasm, and Kindness.

We highly value our people and invest in their growth and well-being through progressive benefits, which puts us among India's top 40 companies in health and wellbeing

  • Excellent work exposure - Some of our recent clients were the UN, the University of East London, and Doctors Without Borders.
  • Meaningful projects to contribute back - Most of our projects are in the education, government, healthcare, and not-for-profit sectors. We also encourage and support team members for open-source contributions.
  • Work-life flexibility and remote work - You decide when and where to work. This has allowed many team members, who couldn’t have held a regular job otherwise, to have thriving careers.
  • Eight-hour workdays - We don't say 8 hours and expect 12 hours minimum. 
  • No micromanagement - Micromanagement makes us grunt like the Hulk. So nobody would be looking over your shoulders. But help is always available when asked.
  • No discrimination - We believe in equal pay for equal work. Personal decisions like planning to have children will not stop you from getting promoted.
  • Championing inclusivity - We like diversity. It enriches our lives and products. If you see something wrong or that could be better on day 1, share through established channels to bring positive change. We listen.
  • Meaningful time off - 52 weekends and 40 days per year of consolidated leave, plus maternity, paternity, adoption, and sabbatical allowances. We also have Kindness leaves for emergencies.
  • Family Medical Insurance - You want your family’s health secured. So do we. We got you, your spouse, and your little ones covered. And free doctor and health and wellness consultations from medical experts, whenever you need.
  • Performance coaching - Our professional, empathetic coaches will help you become your best version through career and personal development. 
  • Event sponsorship -  If your session at any event is selected and aligns with sponsorship guidelines, we cover all expenses for the trip, whether domestic or international.
  • Continuing education allowance - We’ll cover up to 2% of your annual salary yearly for classes, certifications, or buying books to further your capabilities. 

There Are Many Other Progressive Benefits:
  • Health and wellness allowance
  • Generous home office set-up allowance
  • Sponsored team meet-ups
  • Co-working space allowance
  • Event allowance
Axelerant
Axelerant

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say