Senior Infrastructure Engineer
Team: Engineering
Location: Vancouver
Commitment: Full-time
Workplace Type: remote
What you’ll do:
- Build our AI Bot Control Plane — Design and build the critical orchestration layer between the Quandri application and the high-scale AI bots that power it. You’ll own the infrastructure that handles bot scheduling, lifecycle management, deployment pipelines, autoscaling, resource allocation, and real-time health monitoring — ensuring hundreds of concurrent bots run reliably, efficiently, and at scale.
- Shape our Observability Strategy — Architect and implement a full-stack observability platform built on OpenTelemetry, spanning logs, metrics, and distributed traces. You’ll work across the Grafana ecosystem (Prometheus, Loki, Tempo, Grafana) to give engineering deep visibility into system behavior, define and enforce SLOs/SLAs, build dashboards that drive decisions, and establish incident management workflows that keep us ahead of issues — not reacting to them.
- Lay the Foundations of our Security Posture — Implement and mature foundational security practices including SOC 2 compliance, incident response procedures, data loss prevention (DLP), Identity and Access Management (IAM), secrets management, and security monitoring. You’ll be instrumental in building a security-first culture across the engineering organization.
- Pioneer our MLOps Infrastructure — Partner with our Intelligence team to build and operate the MLOps infrastructure that powers our AI capabilities. This includes standing up and scaling our LLM service layer, model training pipelines, model versioning and lineage tracking, experiment management, and production model serving — enabling the team to iterate on models rapidly and deploy them with confidence.
- Scale our Data Infrastructure — Build and maintain the data backbone that captures and processes the high-volume event streams generated by our bots. You’ll design and evolve our data lake and data warehouse architecture on Databricks, enabling reliable ingestion, transformation, and access to the data that fuels our product, analytics, and machine learning initiatives.
- Champion an AI-First Developer Experience — Help define the strategy and build autonomous AI agents that proactively identify infrastructure issues, diagnose root causes, and remediate them — pushing us toward a self-healing platform. You’ll shape how we leverage AI to supercharge developer productivity and operational excellence across the engineering organization.
What We’re Looking For:
- 5+ years of experience in infrastructure, DevOps, platform, or SRE roles in a modern cloud environment
- Deep knowledge of AWS, Terraform (or equivalent IaC tools), and Kubernetes (ECS or EKS preferred)
- Experience with observability platforms and the Grafana ecosystem (Prometheus, Loki, Tempo, Grafana) or equivalent tools; familiarity with OpenTelemetry is a strong plus
- Experience improving CI/CD systems using GitHub Actions and Argo CD
- Understanding of IAM, secrets management, and compliance frameworks (SOC 2 experience a plus)
- Familiarity with MLOps concepts — model serving, training pipelines, LLM infrastructure, or experiment tracking
- Strong written and verbal communication skills; you can document, share, and teach effectively
- A thoughtful, collaborative problem-solver who’s excited to shape foundational systems
Nice to Have:
- Exposure to data infrastructure concepts — event streaming, data lakes, data warehouses, or platforms like Databricks
- Experience building or working with autonomous AI developer agents, AI-powered DevEx tooling, or self-healing infrastructure systems
Our guiding principles:
- Customers at the core. We put the customer at the center of all we do. At a basic level, we believe business success comes down to talking to customers and building something they want. We don’t listen to customers and just take what they say blindly, but we think critically about it and build what they need. Customers are the core of everything we do, and our business exists to serve them. We prioritize their needs over all else within the company.
- Move with urgency. There are times when we need to move slowly and deliberately, but we default to acting fast and with urgency. We slow down when necessary, but this should be a deliberate choice. Businesses become more lethargic as they grow, this principle is designed to fight this fact.
- Be curious. We understand the world by being curious and asking why. We aren’t satisfied with surface level understanding, and seek a deeper understanding of why things are the way they are. Don’t take someone’s word for it or the answer “because that’s how we do it.” Understand why and dig deep.
- Excellence in execution. We know that what separates good from great is a high level of execution. We commit ourselves to excellence in everything that we do, from delivering an amazing product to writing a great email.
- Act like an owner. We’re all owners of the business and act like it. We follow through on commitments, own our results and think long-term.
- Fight for simplicity. The law of increasing functional information states that systems evolve to become more complex over time. At Quandri, we believe there is sophistication in simplicity; as such, we intentionally fight for streamlined solutions and are committed to the uncomplicated.
Compensation and Benefits:
- The range for base pay is $130,000 - $170,000 CAD which is dependent on level of experience, performance and choice of stock option compensation
- Employee stock options based on experience level
- Comprehensive health benefits, including $500 Lifestyle Spending Account
- Four weeks of paid vacation per year
- Work anywhere in the world for 60 calendar days of the year
- Parental leave top-ups: 6 months for birthing parents, 8 weeks for non-birthing parents (up to $100,000 annual salary)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
