Vizcom

Senior Platform and Reliability Engineer

San Francisco, CA
TypeScript Node.js PostgreSQL Redis Kubernetes
Description

Senior Platform & Reliability Engineer (SRE)

Department: EPD (Engineering, Product, Design)

Location: San Francisco

Employment Type: FullTime

Agency Notice: We are not currently working with recruiting agencies for this role. Please do not contact Vizcom employees regarding this position. Any resumes submitted without a prior agreement will be considered unsolicited.

About Vizcom

Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend, Node/Koa + PostGraphile API services, PostgreSQL, Redis, BullMQ queues, and Kubernetes-based production infrastructure.

We’re hiring a senior owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale.

Role Mission

Own service reliability end-to-end: prevent incidents, reduce blast radius when failures happen, and lead fast, high-quality recovery when production degrades.

This is a hands-on technical leadership role with authority to set reliability standards and enforce production guardrails.

Compensation

$200,000 – $250,000 base salary + meaningful equity


What You’ll Own

  • Reliability bar: Set and enforce SLIs/SLOs/error budgets for critical user flows.

  • Production architecture resilience: Drive failure isolation across API, workers, queues, and dependencies so one subsystem cannot take down core access.

  • Kubernetes runtime reliability: Define probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety.

  • Queue + job safety (BullMQ/Redis): Own poison pill containment and workload isolation.

  • Incident command quality: Lead Sev1/Sev2 response end-to-end (containment, communications, technical direction, RCA, corrective action execution).

  • Reliability operating system: Own observability quality (signals over noise), on-call effectiveness, runbooks, and postmortem discipline.

  • Release safety authority: Gate risky deploys and enforce reliability guardrails when production health is at risk.

Traits We’re Looking For

  • Calm, structured incident commander under pressure.

  • Thinks in failure modes and blast radius by default.

  • Pragmatic: can stabilize quickly, then implement durable fixes.

  • High ownership and strong written communication.

First 90 Days

  • Establish baseline reliability metrics and identify top platform risks.

  • Tighten incident response mechanics (roles, comms cadence, runbooks, status updates).

  • Deliver high-impact hardening fixes across probes/startup paths/queue safety.

  • Publish a prioritized 6–12 month reliability roadmap with clear ownership and milestones.

If possible please include one incident you personally led and send to [email protected] :

1) what failed,

2) how you contained it,

3) what permanent fixes you shipped, and measured.

Vizcom
Vizcom

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say