Why do you charge job seekers to use EchoJobs?

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.

How many software engineering jobs are on EchoJobs?

We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!

So, where do the jobs come from?

We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.

What makes EchoJobs different?

We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️

How often are new jobs added?

Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀

How fast can I find a job?

Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯

How often should I check EchoJobs?

Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

Description

Platform Reliability Engineer (SRE / DevSecOps)

Location: Bengaluru, India

Department: Technology

Skills: CI/CD, platform engineering, Cloud Infrastructure, Incident Management

About the role

We are building an AI-native software factory that rapidly launches SaaS products into the market. Our product engineers use AI-assisted development tools to build fast. We are hiring a senior, hands-on Platform Reliability Engineer to make sure every product we launch is production-grade: deployable, observable, secure, scalable, resilient, and cost-efficient.

You will own the shared production layer across our portfolio. Your job is to turn “it works” into “it runs reliably for customers.” You will define the standards, tooling, and operating practices that let a small engineering team launch and maintain many products without operational chaos.

What you will own

This role owns productionization across the service lifecycle: deployment standards, production readiness reviews, observability, SLOs and alerting, automated recovery, scalability, security hardening, and incident response. The goal is to reduce manual operational toil by turning repeatable ops work into software, templates, and automation.

Responsibilities

Build and own the “golden path” for launching and operating products in production:

infrastructure templates
CI/CD pipelines
environment provisioning
secrets management
DNS, SSL, and edge configuration
rollout and rollback workflows
backups and restore testing
monitoring, logging, tracing, dashboards, and alerts

Define and enforce a Production Readiness Review for every launch, covering reliability, security, scalability, rollback, observability, and recovery. Define service-level indicators and service-level objectives for each product, and build alerting tied to customer impact rather than noisy infra events. Architect and operate reliable cloud infrastructure for multi-product SaaS workloads:

autoscaling
load balancing
caching
queues and background jobs
database reliability
failover and disaster recovery
capacity planning and performance tuning

Own runtime and cloud security hardening:

IAM and least-privilege access
secret rotation and key management
dependency and container scanning
patching and vulnerability management
network boundaries and service-to-service access
audit logging
WAF/CDN and edge protections
secure release controls

Lead incident response for production issues:

triage
mitigation
root cause analysis
postmortems
follow-through remediation

Reduce operational toil by automating repetitive support, maintenance, and recovery work. Partner closely with the product engineers from design through launch so every new app is deployable through a standard platform, not a one-off setup. For AI-native products, design runtime guardrails around:

model/API credentials
provider rate limits
graceful degradation during vendor issues
latency and cost monitoring
fallback behavior for core AI workflows

What we’re looking for

5+ years of hands-on experience in SRE, platform engineering, production engineering, DevSecOps, or an infra-heavy backend role with direct production ownership
Strong experience with at least one major cloud platform such as AWS, GCP, or Azure
Strong infrastructure-as-code skills with Terraform, OpenTofu, Pulumi, or equivalent
Strong CI/CD and release engineering experience
Strong observability skills across logs, metrics, traces, dashboards, and alerting
Strong security fundamentals across IAM, secrets, network controls, vulnerability management, and secure delivery
Experience operating containers and/or serverless systems in production
Solid coding and scripting ability in at least one language such as TypeScript, Python, Go, or Bash
Experience with PostgreSQL, Redis, queues, background workers, and modern web app infrastructure
Experience owning on-call, incidents, postmortems, and recovery processes
Comfort working in a fast-moving startup where many products are launched from shared building blocks
Comfort reviewing and hardening AI-generated or AI-assisted code and infrastructure changes

Nice to have

Experience with multi-tenant SaaS products
Experience building internal developer platforms
SOC 2, ISO 27001, or security compliance preparation experience
Experience with LLM/AI application operations
Experience with FinOps or cloud cost optimization
Experience supporting a product portfolio rather than a single application

Success in the first 90 days

Establish a standard production deployment template for all new products
Put centralized monitoring, logging, tracing, and alerting in place
Create and enforce a production readiness checklist for launches
Define initial SLOs for core products
Implement backups and successfully test restore procedures
Roll out a baseline security hardening standard across all production apps
Create incident response runbooks and escalation paths

Success metrics

Time from product-ready codebase to production launch
Change failure rate
Mean time to detect and mean time to recover
Uptime and latency performance against agreed SLOs
Number of critical production incidents
Backup restore success rate
Security findings closed within target time
Infrastructure cost per product and per active customer

Lifesight

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say