Braze

Senior Site Reliability Engineer

Remote Canada
MongoDB Kubernetes Shell Ruby Redis Kafka Chef Terraform Docker Go PostgreSQL
This job is closed! Check out or
Description

Braze (Nasdaq: BRZE) is a leading, comprehensive customer engagement platform that powers interactions between consumers and brands they love. With Braze, global brands like Burger King, Delivery Hero, HBO Max, Mercari, and Venmo can ingest and process customer data in real time, orchestrate and optimize contextually relevant, cross-channel marketing campaigns, and continuously evolve their customer engagement strategies. And we do it at scale – last fiscal year our customers used Braze to send approximately 1.5 trillion messages to billions of monthly active users.

But we’re so much more than our platform. Although we’ve recently grown to a team of over 1,300 people, Braze still buzzes with energy, collaboration, and transparency. We value curiosity, individuality, and tenacity—as part of the team, you’ll be encouraged to take your seat at the table and create your own destiny. Our values are inspired by our employees, which means Braze is a place where you can truly be yourself. We're growing, with a focus on building for the long term under tenured leadership and continuing to evolve for the better.

Need more proof? Braze is proudly certified as a Great Place to Work® in the U.S. and the UK. In 2022, Braze ranked #1 on Fortune’s Best Small and Medium Workplace in New York, #5 on Fortune’s Best Workplaces for Millennials in the US, and #11 on Fortune’s Best Medium Sized Workplace for Women in the UK. 

You’ll find many of us at headquarters in New York City or around the world in Austin, Berlin, Chicago, London, Paris, San Francisco, Singapore, Tokyo, and Toronto.

Site Reliability Engineers (SREs) are responsible for keeping all internal-facing services and platforms running smoothly. In a nutshell, SREs ensure site uptime. SREs blend sensible system administrators and software engineers who apply sound engineering principles, operational discipline, and mature automation to the environments and infrastructure services we provide. We specialize in systems–whether it be networking, the Linux kernel, or some more specific interest in scaling–algorithms or distributed systems.

Our team helps to improve automation, infrastructure reliability, and empowers Braze’s other engineering teams to leverage the infrastructure products and platforms we create easily. Braze operates at a massive scale with over 4.7 billion monthly active users across our customers, collecting hundreds of billions of data points each month, and sending billions of messages to end-users daily. We use a diverse technology stack rooted in Ruby on Rails, MongoDB, Redis, Kafka, Kubernetes, and more.  As a Site Reliability Engineer at Braze, you will collaborate with your team and consumer engineering teams to continuously improve the infrastructure, automation, and tooling that build internal products from these technologies.

WHAT YOU'LL DO

  • Partner with Braze’s engineering teams on:
    • Architecting products to effectively utilize infrastructure platforms in a scalable, reliable manner.
    • Debugging reliability and scalability issues across all stack layers, including the products built using our infrastructure platforms.
    • Make monitoring and alerting alerts on symptoms and not on outages.
    • Ensure that Braze meets our strict enterprise-grade SLAs with customers.
  • Develop Braze’s internal platform infrastructure:
    • Create Infrastructure as code using  Chef, Terraform, and Kubernetes.
    • Develop deployment pipelines for applications in multiple languages using Docker, Kubernetes, etc.
    • Provide centralized/common tooling, services, and automation frameworks that are critical for scaling operations, capacity management, reducing operational pain, and improving the day-to-day workflow of Braze’s engineering teams.
  • Manage incidents:
    • Be on a PagerDuty rotation to respond to availability incidents and provide support for other engineers.
    • Use your on-call shift to prevent incidents from ever happening.
    • Retrospect everything that happens to turn lessons into system improvements/changes, automation, etc.

WHO YOU ARE

  • 4+ years of experience as a Software, DevOps, or Site Reliability Engineer
  • You think about systems - interfaces, boundaries, edge cases, failure modes, behaviors, specific implementations.
  • Have an urge to collaborate, document, and deliver quickly.
    • Collaborating across the global remote teams, often working asynchronously. 
    • Document everything so you don't need to learn the same thing (or plan the same work) twice.
    • Delivering fast to delight our customers–even internal ones
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
  • Have a desire to solve everyday challenges facing software engineers and automate their toil away
  • Have an excellent ability to manage multiple tasks and expectations at once
  • Know your way around Linux and Unix Shell.
  • Have strong programming skills - Ruby and/or Go preferred
  • Have experience with Docker, Kubernetes, Terraform, or similar IaC technologies
  • Have experience with MongoDB, Redis, Kafka, Postgres, or similar data technologies

 

#LI-REMOTE

WHAT WE OFFER

From comprehensive benefits to remote availability to flexible time off, we’ve got you covered so you can prioritize work-life harmony.

  • Competitive compensation that includes equity
  • Retirement and Employee Stock Purchase Plans
  • Flexible paid time off
  • Comprehensive benefit plans covering medical, dental, vision, life, and disability
  • Family services that include fertility benefits and equal paid parental leave 
  • Global presence, dog-friendly offices, and remote availability 
  • Professional development supported by formal career pathing, learning platforms, and tuition reimbursement 
  • Community engagement opportunities throughout the year, including an annual company wide Volunteerism Week 
  • Employee Resource Groups that provide supportive communities within Braze
  • Collaborative, transparent, and fun culture recognized as a Great Place to Work® 

Details of these benefit plans will be provided if a candidate receives an offer of employment. Benefits may vary by location.

Please see our Candidate Privacy Policy for more information on how Braze processes your personal information during the recruitment process and, if applicable based on your location, how you can exercise any privacy rights.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 212 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers