Observability Engineer
Department: Engineering
Location: Las Vegas, Nevada
Employment Type: FullTime
Our mission at TensorWave Cloud is to build seamless, secure, reliable, and resilient AI infrastructure at scale, eliminating barriers and challenging the status quo to empower builders and support AI innovation.
About the role
We are looking for an Observability Engineer who is deeply obsessed with Grafana, Prometheus, and modern observability practices. This role exists to ensure our systems are measurable, understandable, and debuggable at all times.
You will own the observability stack end-to-end — from instrumentation standards to dashboards, alerts, and signal quality — and work closely with infrastructure, platform, and application teams to make sure nothing important fails silently.
If you think about metrics before features, believe bad alerts are worse than no alerts, and treat Grafana dashboards as first-class products, this role is for you.
Responsibilities
Own and evolve our observability and monitoring platform, with Grafana and Prometheus at its core
Design, build, and maintain high-quality metrics pipelines using Prometheus and related tooling
Create clear, actionable Grafana dashboards that tell a story — not just charts
Define and maintain alerts that are meaningful, actionable, and low-noise
Establish and enforce observability standards across services (metrics, logs, traces)
Partner with engineering teams to instrument applications correctly
Lead improvements to alerting strategies, SLOs, and SLIs
Support incident response by helping teams quickly understand what broke and why
Continuously evaluate and improve signal quality, cardinality, and cost
Identify observability gaps and eliminate blind spots before they become outages
You Are Obsessed With:
Grafana dashboards that instantly explain system health
Prometheus metrics that are intentionally designed, not accidental
Alerts that wake people up only when action is required
Monitoring that scales with system complexity
Observability as a product, not an afterthought
Required Experience
Strong hands-on experience with Grafana and Prometheus
Deep understanding of metrics-based observability
Experience designing monitoring and alerting systems at scale
Strong knowledge of alerting best practices (burn rates, SLO-based alerts, noise reduction)
Experience working with distributed systems and cloud or Kubernetes environments
Ability to reason about system behavior using telemetry
Comfortable working across teams to improve instrumentation and visibility
Preferred Experience
Experience with OpenTelemetry
Familiarity with logs and traces (Loki, Tempo, Jaeger, etc.)
Kubernetes observability experience
Experience operating observability systems in high-scale or production-critical environments
Infrastructure-as-Code experience (Terraform, Helm, etc.)
What We Bring
Mission driven company
Competitive Salary
Stock Options
100% paid Medical, Dental, and Vision insurance
Life and Voluntary Supplemental Insurance
Short Term Disability Insurance
Flexible Spending Account
401(k)
Flexible PTO
Paid Holidays
Parental Leave
Mental Health Benefits through Spring Health
We’re looking for resilient, adaptable people to join our team, people who believe in the mission and think at massive scale. The solutions that worked on a handful of devices will not work at Exascale. Be prepared to be pushed daily, to learn a lot, and literally build the future.
TensorWave is an equal opportunity employer, committed to fostering an inclusive and supportive workplace. All qualified applicants and candidates will receive consideration for employment without regard to race, color, religion, sex, disability, age, national origin, or veteran status.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
