Discover

Principal Infrastructure Engineer (DevOps/SRE)

Riverwoods, IL US
API Kubernetes AWS Git Python Java Bash Groovy
This job is closed! Check out or
Description

Discover. A brighter future.

With us, you’ll do meaningful work from Day 1. Our collaborative culture is built on three core behaviors: We Play to Win, We Get Better Every Day & We Succeed Together. And we mean it — we want you to grow and make a difference at one of the world's leading digital banking and payments companies. We value what makes you unique so that you have an opportunity to shine.

Come build your future, while being the reason millions of people find a brighter financial future with Discover.

Job Description:

At Discover, be part of a culture where diversity, teamwork and collaboration reign. Join a company that is just as employee-focused as it is on its customers and is consistently awarded for both. We’re all about people, and our employees are why Discover is a great place to work. Be the reason we help millions of consumers build a brighter financial future and achieve yours along the way with a rewarding career.  

As a Principal DevOps/Reliability Engineer, you will have an opportunity to make a positive impact across the organization. You will partner with product teams to identify and fix inefficiencies to solve system reliability and performance opportunities. Some examples include reviewing availability expectations, addressing performance issues, uncovering observability gaps, leading problem management, and driving capacity planning. 

The Engineer uses a vast repertoire of experience delivering high impact engineering solutions to work intuitively. This person knows where to look if something breaks and is key in solving challenges quickly.  You are skilled in finding new ways to build technical capabilities and influencing the adoption of best practices. Actively manage and escalate risk and customer-impacting issues within the day-to-day role to management. 

Responsibilities:

  • Develop and run SRE tooling and observability using automation like CI/CD, and Kubernetes 
  • Responsible for family level application reliability and resiliency  
  • Define and implement SLOs\SLAs\SLIs, troubleshooting, building support playbooks, implementing monitoring and alerting, logging standards, conducting fragility & performance testing, etc. 
  • Define and implement DR plan needed for critical applications 
  • Periodically pair/mob program with the teams to help build reliability thinking. 
  • Participate in failure point discussions, chaos testing and family level capacity management. 
  • Leverage metrics and scorecards to better drive site reliability and observability 
  • Ensure the proper level of documentation exists. 
  • Contributes to product delivery, influenced by strategic direction across Business Technology and enterprise including investment decisions (i.e., build vs buy) across multiple products. Uses automation, system tools, open-source solutions, observability and 'security first' principles in daily work 
  • Contributes to team agile ceremonies, leads demos and presentations, helps new engineers learn established norms 
  • Bridges desired Business features to BT capabilities and executes groundwork in requirement translations and design. Initiates detailed level solution design approaches across multiple domains, and guides team to achieve desired key software delivery capabilities using automated, coded enterprise and observability 
  • Participates in (internal/external) speaking and advocacy events. Coaches core technology communities and is actively engaged in understanding and researching new technologies 
  • Researches and adopts new technology solutions and ways to build technical capabilities 
  • Continues professional education and creates opportunities for broader engineering teams to learn industry best practices

 
Minimum Qualifications  

At a minimum, here’s what we need from you:  

  • Bachelors – Computer Science or related  
  • 6+ Years – Information Technology, (Software) Engineering, or related  
  • Internal applicants only: technical proficiency rating of proficient on the Dreyfus engineering scale  

Preferred Qualifications 

  • 3+ years in a SRE or DevOps role 
  • In-depth understanding of Cloud infrastructure/technologies, especially AWS (VPC / Security groups / EC2 / Storages / Load Balancers) 
  • Experience with DevOps tools, processes, and culture. 
  • Experience with CI/CD pipelines with Jenkin, Git/GitHub, Gitlab or similar 
  • Extensive experience leading customer facing systems in a mission critical environment. 
  • Advanced experience with programming and/or scripting languages (e.g. Python, Java, bash, Groovy) 
  • In depth knowledge on application development landscape - Java, Rest API, design patterns and CI/CD. 
  • Extensive experience with monitoring and observability tools/technologies (i.e., Grafana, Kibana, Datadog) 
  • Creation of standardized monitoring dashboards in cloud platforms for proactive monitoring of application and infrastructure health 
  • In-depth knowledge of Non-functional requirements (NFR’s) including pressure/chaos testing, performance, and penetration testing 

Bonus Points If You Have:  

  • Reliability best practices in the cloud native environment 
  • Operational Readiness strategies and best practices 
  • Risk management knowledge 

#LI-MF1

Application Deadline:

The application window for this position is anticipated to close on Apr-14-2024. We encourage you to apply as soon as possible. The posting may be available past this date, but it is not guaranteed.

Compensation:

The base pay for this position generally ranges between $104,000.00 to $175,600.00. Additional incentives may be provided as part of a market competitive total compensation package. Factors, such as but not limited to, geographical location, relevant experience, education, and skill level may impact the pay for this position.

Benefits:

We also offer a range of benefits and programs based on eligibility. These benefits include:

  • Paid Parental Leave

  • Paid Time Off

  • 401(k) Plan

  • Medical, Dental, Vision, & Health Savings Account

  • STD, Life, LTD and AD&D

  • Recognition Program

  • Education Assistance

  • Commuter Benefits

  • Family Support Programs

  • Employee Stock Purchase Plan

Learn more at mydiscoverbenefits.com.

What are you waiting for? Apply today!

All Discover employees place our customers at the very center of our work. To deliver on our promises to our customers, each of us contribute every day to a culture that values compliance and risk management.

Discover is committed to a diverse and inclusive workplace. Discover is an equal opportunity employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status, or other legally protected status. (Know Your Rights & Pay Transparency Nondiscrimination Provision)

Discover complies with federal, state, and local laws applicable to qualified individuals with disabilities and is committed to providing reasonable accommodations. If you require a reasonable accommodation to search for a position, to complete an application, and/or to participate in an interview, please email HireAccommodation@discover.com. Any information you provide regarding your accommodation needs will be kept confidential and will only be used to determine and provide necessary accommodation.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 257 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers