Grail

Staff Site Reliability Engineer #3718

Menlo Park, CA Remote Hybrid
USD 180k - 210k
Kubernetes Python Bash Ansible Terraform AWS Docker
Description
GRAIL is a healthcare company whose mission is to detect cancer early, when it can be cured. GRAIL is focused on alleviating the global burden of cancer by developing pioneering technology to detect and identify multiple deadly cancer types early. The company is using the power of next-generation sequencing, population-scale clinical studies, and state-of-the-art computer science and data science to enhance the scientific understanding of cancer biology, and to develop its multi-cancer early detection blood test. GRAIL is headquartered in Menlo Park, CA with locations in Washington, D.C., North Carolina, and the United Kingdom. GRAIL, LLC is a wholly-owned subsidiary of Illumina, Inc. (NASDAQ:ILMN). For more information, please visit www.grail.com.

GRAIL is seeking a Staff Software Engineer in our Site Reliability Engineering (SRE)  team to help us improve security and reliability of production systems that are critical for our mission to detect cancer early and save lives. You will contribute to the architecture, design, development, implementation, and be responsible for secure, healthy, and reliable operation of critical cloud-based infrastructure, services, and applications. You are someone who enjoys learning and implementing best industry technology trends and practices. You foster and contribute to the creative and collaborative culture to deliver results. You embrace ambiguity and enjoy exploring new technologies delivering robust, scalable solutions.

This is a hybrid role and requires you to be onsite 2 days a week in Menlo Park, CA 

Responsibilities

  • Ensure High Availability:  Implement and maintain resilient cloud architectures, monitor system performance, and proactively identify and resolve potential bottlenecks or points of failure. 
  • Incident Management:  Play an active role in production on-call, responding swiftly to troubleshoot and resolve production issues. Minimize service disruptions and downtime by conducting thorough triaging and debugging of product or system issues. Continuously optimize the on-call process for sustainability and efficiency.
  • Automation and Tooling:  Develop and maintain automation scripts, tools, and processes to streamline system deployment, monitoring, and management tasks. Your contributions will be vital in efficiently scaling cloud operations.
  • Performance Optimization:  Optimize cloud infrastructure and applications for performance, scalability, and cost-effectiveness.
  • Security and Compliance:  Collaborate with security engineers to implement best practices and ensure compliance with security standards and policies.
  • Monitoring and Alerting:  Design and configure advanced monitoring systems to gain insights into system behavior, set up alerts, and respond proactively to potential issues. Create and maintain comprehensive dashboards and playbooks for production on-call.
  • Software Development Consultation: Engage actively in the entire software development lifecycle. Participate in system design reviews and provide valuable Site Reliability Engineer (SRE) insights during launch reviews, influencing and enhancing system architecture.

Preferred Qualifications

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 3+ years of professional experience maintaining production systems on Cloud based services and infrastructure.
  • 8+ years of software development experience in one or more programming languages with a primary focus on leveraging, working on cloud-based services and infrastructure. 
  • Strong knowledge of AWS cloud platform
  • Practical experience with containerization technologies, including Docker and Kubernetes.
  • Familiarity with Python, Bash scripting and Ansible
  • Familiarity with infrastructure as code tools like Terraform is essential.
  • Solid understanding of databases, networking, security principles, and best practices.
  • Proficiency in using monitoring and alerting tools to detect and respond to potential issues effectively.

Desired Skills

  • AWS Certifications (such as Solutions Architect, Security, etc.)
  • Experience in a regulated industry or healthcare field
The expected, full-time, annual base pay scale for this position is $180,000 - $210,000.  Actual base pay will consider skills, experience, and location.

Based on the role, colleagues may be eligible to participate in an annual bonus plan tied to company and individual performance, or an incentive plan. We also offer a long-term incentive plan to align company and colleague success over time.

In addition, GRAIL offers a progressive benefit package, including flexible time-off, a 401k with a company match, and alongside our medical, dental, vision plans, carefully selected mindfulness offerings.

GRAIL is an Equal Employment Employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, protected veteran status, disability or any other legally protected status. We will reasonably accommodate all individuals with disabilities so that they can participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation. GRAIL maintains a drug-free workplace.
Grail
Grail
Biotechnology Health Care Health Diagnostics Medical Medical Device

0 applies

50 views

Similar Jobs

Devops Engineer Role

Bengaluru, India Remote Hybrid

Devops Engineer - LM

Bengaluru, India Remote Hybrid

DevOps Senior Engineer

Remote Hyderabad, India

DevOps Engineer

Wroclaw, Poland

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 264 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers