Fidelity

Principal, Site Reliability Engineer

US
Python Node.js Java Shell Kubernetes AWS
Search for More Jobs Talk to a recruiter now 💪
This job is closed! Check out or
Description

Job Description:

We are hiring a Principal Site Reliability and Support specialist, a motivated technologist / leader, to join our support organization. In this role, the resource will serve as a production support and SRE specialist for supporting FFIO Business Units Infrastructure and Applications. Your key partners are Fund Accounting (Fixed Income and Equity, Money Market and Institutional Products), Pricing, Cash Hub, Trade Hub and DAL accounting Technology / Business Teams.
 

The team comes with a diverse technological background and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience as an SRE. We are looking for a system thinking specialist who will be helping the teams scale through production insights, operational automation, developer guidance, real-time metrics and automation. This is a great opportunity for anyone looking to lead, learn and use their Cloud, Database, Middle-tier technical skills and experience to drive production stability, reliability, and resiliency.
 

The Expertise You Have and The Skills You Bring

  • Bachelor’s degree or higher in a technology related field (like Engineering, Computer Science, Information Technology) required, master’s degree is a plus.

  • A minimum of 5+ years of hybrid experience in Production Support, Development and SRE Experience. Hands-On experience deploying and/or supporting highly distributed multi-tiered systems at scale.

  • A minimum of 2+ years of experience in cloud development (AWS) and migration skills; Experience with building and operating highly resilient platforms in AWS Cloud Environments.

  • 2 - 4+ years of experience in software development with Python, NodeJS, Java with a focus on SDLC and automation.

  • A self-starter and team player who can independently manage multiple responsibilities in a dynamic environment.

  • Strong hands-on experience and ability to automate with various scripting languages such as Python, Shell Scripting, etc.

  • Solid understanding of Cloud Computing and DevOps concepts including CI/CD Pipelines

  • Hands-On Kubernetes skills and knowledge.

  • Expert and hands on experience with one or more Observability tools (Prometheus, Grafana, ELK/OpenSearch, Open Telemetry, Datadog, etc.).

  • Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.

  • Proven experience in maintaining scalability and resiliency in complex environments.

  • Proven experience in implementing advanced observability practices and techniques at scale.

  • Ability to triage, perform root cause analysis, and be decisive under pressure.

  • Experience managing and interpreting large datasets using query languages and visualization tools.

  • Excellent verbal, written communication skills and ability to tailor them to various audiences.

  • Ability and high-level curiosity enabling the desire to learn new technologies, tools and bring them to our developers.

  • Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner to build and maintain effective relationships.

  • Familiarity with Agile Software Development Methodologies.

  • Highly effective business communication and influencing skills.

  • AWS and AWS / EKS certifications are a plus.

The Team

Our Site Reliability Engineering and production support services group within Enterprise Infrastructure for Fidelity Fund and Investment Operations (FFIO) combines Operations Excellence with the Development Experience to deliver services at high-scale, high-availability with resilience by using automation Infrastructure as code. We built reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability in addition to core production support like Incident, Change, Problem and Release management.

We partner with our key stakeholders in Information Technology and business teams to deploy new functionalities, software fixes, SRE Features and support applications in a wide range of infrastructures and products.

Certifications:

Category:

Information Technology

Fidelity’s working model blends the best of working offsite with maximizing time together in person to meet associate and business needs. Currently, most hybrid roles require associates to work onsite all business days of one assigned week per four-week period (beginning in September 2024, the requirement will be two full assigned weeks). 

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 307 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers