Principal Member of Technical Staff - DevOps (US Citizen Required)
Location: United States
As a Principal Member of Technical Staff (DevOps), you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. This role blends strong software engineering fundamentals with Site Reliability Engineering (SRE) and production engineering practices to deliver highly scalable, resilient, secure, and observable cloud-native services. You will design, develop, and own complex distributed systems end-to-end—from architecture and implementation to production operations, reliability, and continuous improvement. Working closely with technical leads and cross-functional teams, you will ensure services are built using modern engineering principles with a strong focus on availability, scalability, performance, operability, and cost-awareness. You will embed SRE practices such as SLI/SLO definition, error budgets, observability, incident response, and automated remediation into the development lifecycle. You will proactively improve system reliability through automation, data-driven insights, structured operational workflows, and production engineering excellence (including safe experimentation and resilience testing where appropriate). You will also leverage AI-assisted development tools to accelerate delivery, improve troubleshooting, and enhance engineering productivity—while maintaining rigorous standards for code quality, security, and reliability.
Responsibilities
- Design, build, and operate scalable, secure, and maintainable distributed services in a cloud-native, microservices-based environment.
- Drive architecture and implementation decisions aligned with reliability, performance, and operability requirements.
- Deliver high-quality code with strong CI/CD, automated testing, and release engineering practices.
- Define and operationalize SLIs/SLOs, manage error budgets, and continuously improve service reliability.
- Build and enhance observability across services (metrics, logs, traces), including actionable dashboards and alerting.
- Lead and participate in incident management, on-call/operational readiness, root cause analysis (RCA), and blameless postmortems.
- Build, improve, and standardize operational workflows (runbooks, playbooks, change management, escalation paths, and service readiness reviews).
- Develop and maintain automation for operational excellence: self-healing, automated remediation, drift detection, and reliability guardrails.
- Use automation tools and frameworks to reduce toil and increase consistency across environments.
- Apply AI tools to support coding, debugging, alert/incident triage, and operational insights (AIOps-aligned workflows where appropriate).
Minimum Qualifications
- BS/MS in Computer Science (or equivalent practical experience).
- Must be a U.S. citizen with ability to obtain & maintain a Federal Security Clearance
- At least 7 years of relevant software engineering experience.
- Proficient in at least one (preferably two) of: Java, C/C++, Golang.
- Hands-on experience in SRE or similar roles (DevOps / Production Engineering).
- Proven, hands-on experience with automation tools and frameworks (e.g., infrastructure/app automation, CI/CD automation, operational runbook automation).
- Strong scripting skills (e.g., Python, Bash, or similar).
- Demonstrated experience building or improving operational workflows in production environments.
- Strong understanding of reliability engineering, monitoring/observability, and incident management (including RCA and postmortems).
AI-Assisted Engineering
- Demonstrated experience using AI-assisted development tools/IDEs (e.g., Codex, Claude, Cline, or similar) and integrating them into development workflows to improve productivity and reduce turnaround time.
- Experience using ChatGPT, Claude, or similar models to support development and operational tasks (e.g., code generation, debugging, documentation, triage).
Preferred Qualifications
- Experience with containers, Kubernetes, and operating reliable services at scale.
- Familiarity with MCP tools/servers and multi-tool orchestration / skills-based frameworks.
- Familiarity with “AI-accelerated” development approaches (rapid prototyping plus disciplined engineering, testing, and operational readiness).
- Strong CS fundamentals: data structures, algorithms, operating systems, networking, and distributed systems.
- Excellent communication and collaboration skills; comfortable working across teams and communicating technical topics to senior stakeholders.
- Experience contributing to intelligent automation and AIOps-driven workflows.
About Us
Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives.
True innovation starts when everyone is empowered to contribute. That’s why we’re committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs.
We’re committed to including people with disabilities at all stages of the employment process. If you require accessibility assistance or accommodation for a disability at any point, let us know by emailing [email protected] or by calling 1-888-404-2494 in the United States.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans’ status, or any other characteristic protected by law. Oracle will consider for employment qualified applicants with arrest and conviction records pursuant to applicable law.
As a Principal Member of Technical Staff (DevOps), you will play a pivotal role in building and operating the next-generation, AI-first Electronic Health Record platform. This role blends strong software engineering fundamentals with Site Reliability Engineering (SRE) and production engineering practices to deliver highly scalable, resilient, secure, and observable cloud-native services. You will design, develop, and own complex distributed systems end-to-end—from architecture and implementation to production operations, reliability, and continuous improvement. Working closely with technical leads and cross-functional teams, you will ensure services are built using modern engineering principles with a strong focus on availability, scalability, performance, operability, and cost-awareness.There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say
