Senior Software Engineer II, Observability
Team: Platform Engineering
Location: Canada
Commitment: Full-time
Workplace Type: remote
Here’s How You Make an Impact:
-
Design and maintain Python tooling and Terraform modules that standardize Datadog configuration across services.
-
Eliminate manual setup by codifying monitors, dashboards, SLOs, and alerting patterns.
-
Improve consistency, repeatability, and reliability of observability across the organization.
-
Define and implement observability blueprints that integrate high‑fidelity metrics, logs, and traces into the development lifecycle.
-
Codify best practices so teams get out-of-the-box visibility without needing deep observability expertise.
-
Raise the baseline for service health, debuggability, and operational readiness
-
Own critical parts of the Datadog platform configuration.
-
Improve data quality, signal-to-noise ratio, and alert reliability.
-
Partner with teams to adopt telemetry effectively while managing ingestion and alerting costs.
-
Upgrade and maintain tracers, agents, and shared observability libraries.
-
Ensure upgrades are automated, backwards-compatible, and minimally disruptive to product teams.
-
Reduce operational risk by improving rollout and validation processes.
-
Collaborate with Platform and Infrastructure teams to embed monitoring into systems such as Kafka, gRPC services, Kubernetes, and AWS-managed services.
-
Improve production visibility and reduce mean time to detect (MTTD) and resolve (MTTR) incidents
-
Write clean, well-tested, and maintainable Python code and Terraform modules.
-
Participate in architecture and design reviews; provide thoughtful feedback in code reviews.
-
Take ownership of projects end-to-end, from design and implementation through production rollout and support.
-
Assist team members to solve problems and develop their own skills.
-
Foster a collaborative mindset within the team.
Build and Scale Observability-as-Code
Establish Reliable & Standardised Instrumentation
Optimize Datadog Usage and Cost
Maintain and Evolve Platform Components
Integrate Observability Across Infrastructure
Deliver High-Quality, Production-Ready Code
Mentorship & Collaboration
You Thrive Here By Possessing the Following:
-
Degree in Computer Science, or related.
-
7+ years of experience in application development, platform engineering, or developer tooling.
-
High proficiency in Python; solid experience with Terraform.
-
Hands-on experience using Datadog for metrics, logging, tracing, dashboards, monitors, and alerts.
-
Experience with containerized and cloud-native environments (e.g., Kubernetes, Kafka, AWS, gRPC, Lambda).
-
Proven ability to independently drive medium-to-large initiatives from design to delivery.
-
Comfortable making pragmatic tradeoffs to deliver reliable, scalable solutions.
-
A strong product mindset for internal tools.
-
Passion for reducing cognitive load, eliminating toil, and making observability easy to adopt by default.
-
Solid understanding of modern web applications and distributed systems.
-
Knowledge of how observability applies to high-throughput, highly available systems
-
Clear written and verbal communication skills.
-
Ability to influence technical direction through design discussions, documentation, and hands-on implementation.
-
Comfortable partnering with product, platform, and infrastructure teams.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
