Distributed Systems Engineer - High-Availability Dispatch
Location: Remote
Department: Engineering
Who we are:
Glydways is reimagining what public transit can be. We believe that mobility is the gateway to opportunity—connecting people to housing, education, employment, commerce, and care. By making transportation more accessible, affordable, and sustainable, we empower communities to thrive and unlock economic and social prosperity.
Our mission is to revolutionize transit with a solution that delivers high capacity, exceptional user experiences, unmatched affordability, and minimal environmental impact.
The Glydways system is a groundbreaking network of carbon-neutral, interconnected transit pathways powered by standardized autonomous vehicles on dedicated roadways. Operating 24/7 with on-demand access, it offers personalized and efficient mobility—without the burden of heavy upfront infrastructure costs or ongoing taxpayer subsidies.
With Glydways, we’re building more than a transportation system; we’re creating a future where everyone, everywhere, has the freedom to move.
About the Role:
Glydways’ Dispatch system is the centralized brain that coordinates our autonomous vehicle fleet. We’re looking for a senior distributed systems / backend engineer to design and implement state sharing between Dispatch instances, hot failover mechanisms, and robustness testing for this safety-critical, real-time service. This is an application-layer role: you will work primarily on the Dispatch codebase (C++), making stateful services correct and resilient under failure, not on generic DevOps, cloud account management, or CI/CD pipelines. You’ll partner closely with autonomy and ops teams to harden behavior, design recovery flows, and drive down flaky and unsafe production states over time.
Candidates whose experience is limited to Kubernetes administration, CI/CD tooling, or cloud configuration without owning stateful application behavior are not a fit for this role.
Responsibilities:
- Design and implement state sharing and replication between multiple Dispatch instances (tickets, journeys, vehicle state, restrictions).
- Build leader election and failover mechanisms (active/standby, hot/warm backup) that guarantee a single authoritative Dispatch at a time and clean handoff on failures.
- Harden Dispatch behavior for restart-safety and idempotency, ensuring retries, replays, and partial failures do not cause double assignment, inconsistent state, or unsafe conditions.
- Design and run stress, load, and fault-injection tests (including chaos experiments) to validate Dispatch behavior under high load, network issues, and process crashes.
- Improve system hardening and recovery flows, defining how Dispatch enters safe modes, recovers from faults, and resumes normal operation in a controlled way.
- Extend and tune observability for Dispatch (logs, metrics, traces, SLOs) so state divergence, failover events, and backlog issues are visible and diagnosable.
- Collaborate with autonomy, product, and ops teams to translate algorithmic and operational requirements into concrete guarantees around state, failover, and robustness.
- Participate in on-call and incident response for Dispatch, lead root-cause analysis for reliability issues, and drive long-term fixes into the application code and architecture.
Knowledge, Skills and Abilities:
- Proven experience designing and shipping stateful distributed services that stay correct under failures.
- Strong programming background in a systems language (C++ strongly preferred) and comfort working at the application layer (routing, tickets, vehicle state, safety envelopes).
- Hands-on experience with leader election / primary–secondary patterns, active/standby or similar, and state replication / recovery (snapshots, event logs, replay, or equivalent).
- Deep understanding of idempotent operations and message semantics (retries, duplicates, out-of-order messages) in networked, message-driven systems (TCP/UDP, gRPC, pub/sub, etc.).
- Experience designing and running stress, load, soak, and fault-injection/chaos tests for distributed systems, and using their results to drive system hardening.
- Strong observability and incident-response skills: defining SLOs, instrumenting metrics/traces, debugging complex failure modes, and leading postmortems for stateful services.
- Safety-critical or mission-critical mindset: familiarity with failure-mode analysis and designing for fail-safe / fail-operational behavior is a plus.
- Experience with cloud platforms is a plus, but this is not a pure DevOps or CI/CD role; candidates must have meaningful ownership of application-level behavior and state.
Glydways provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
