Job Description:
We are hiring a Principal Site Reliability and Support specialist, a motivated technologist / leader, to join our support organization. In this role, the resource will serve as a production support and SRE specialist for supporting FFIO Business Units Infrastructure and Applications. Your key partners are Fund Accounting (Fixed Income and Equity, Money Market and Institutional Products), Pricing, Cash Hub, Trade Hub and DAL accounting Technology / Business Teams.
The team comes with a diverse technological background and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience as an SRE. We are looking for a system thinking specialist who will be helping the teams scale through production insights, operational automation, developer guidance, real-time metrics and automation. This is a great opportunity for anyone looking to lead, learn and use their Cloud, Database, Middle-tier technical skills and experience to drive production stability, reliability, and resiliency.
The Expertise You Have and The Skills You Bring
Bachelor’s degree or higher in a technology related field (like Engineering, Computer Science, Information Technology) required, master’s degree is a plus.
A minimum of 5+ years of hybrid experience in Production Support, Development and SRE Experience. Hands-On experience deploying and/or supporting highly distributed multi-tiered systems at scale.
A minimum of 2+ years of experience in cloud development (AWS) and migration skills; Experience with building and operating highly resilient platforms in AWS Cloud Environments.
2 - 4+ years of experience in software development with Python, NodeJS, Java with a focus on SDLC and automation.
A self-starter and team player who can independently manage multiple responsibilities in a dynamic environment.
Strong hands-on experience and ability to automate with various scripting languages such as Python, Shell Scripting, etc.
Solid understanding of Cloud Computing and DevOps concepts including CI/CD Pipelines
Hands-On Kubernetes skills and knowledge.
Expert and hands on experience with one or more Observability tools (Prometheus, Grafana, ELK/OpenSearch, Open Telemetry, Datadog, etc.).
Experienced in Instrumentation with systems skills on building and operating, monitoring, logging, alerting services of distributed systems at scale.
Proven experience in maintaining scalability and resiliency in complex environments.
Proven experience in implementing advanced observability practices and techniques at scale.
Ability to triage, perform root cause analysis, and be decisive under pressure.
Experience managing and interpreting large datasets using query languages and visualization tools.
Excellent verbal, written communication skills and ability to tailor them to various audiences.
Ability and high-level curiosity enabling the desire to learn new technologies, tools and bring them to our developers.
Ability to work with a variety of individuals and groups, both in person and virtually, in a constructive and collaborative manner to build and maintain effective relationships.
Familiarity with Agile Software Development Methodologies.
Highly effective business communication and influencing skills.
AWS and AWS / EKS certifications are a plus.
The Team
Our Site Reliability Engineering and production support services group within Enterprise Infrastructure for Fidelity Fund and Investment Operations (FFIO) combines Operations Excellence with the Development Experience to deliver services at high-scale, high-availability with resilience by using automation Infrastructure as code. We built reliability into our ecosystem by applying best practices in Resiliency Engineering, Automation, Observability in addition to core production support like Incident, Change, Problem and Release management.
We partner with our key stakeholders in Information Technology and business teams to deploy new functionalities, software fixes, SRE Features and support applications in a wide range of infrastructures and products.
Certifications:
Category:
Information TechnologyFidelity’s working model blends the best of working offsite with maximizing time together in person to meet associate and business needs. Currently, most hybrid roles require associates to work onsite all business days of one assigned week per four-week period (beginning in September 2024, the requirement will be two full assigned weeks).
0 applies
63 views
Jobs from our Partners
Artificial Intelligence (AI) Cybersecurity Architect
AI and Data Systems Architect
System Integration and Test Engineer
Similar Jobs
Web Application Security Engineer, Vulnerability Remediation
Web Application Security Engineer, Vulnerability Remediation
Head of Engineering (reporting to CEO)
9145 - Member of Technical Staff - Full Stack Developer
Software Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 307 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
Cancel anytime / Money-back guarantee