Bank of America

Site Reliability Engineer II

US
Shell Hadoop Kubernetes Python Perl Spark Kafka Oracle PostgreSQL Java Docker SQL Cassandra Ansible
Search for More Jobs Talk to a recruiter now 💪
Description

Job Description:

About us:

At Bank of America, we are guided by a common purpose to help make financial lives better through the power of every connection. Responsible Growth is how we run our company and how we deliver for our clients, teammates, communities and shareholders every day.

One of the keys to driving Responsible Growth is being a great place to work for our teammates around the world. We’re devoted to being a diverse and inclusive workplace for everyone. We hire individuals with a broad range of backgrounds and experiences and invest heavily in our teammates and their families by offering competitive benefits to support their physical, emotional, and financial well-being.

Bank of America believes both in the importance of working together and offering flexibility to our employees. We use a multi-faceted approach for flexibility, depending on the various roles in our organization.

Working at Bank of America will give you a great career with opportunities to learn, grow and make an impact, along with the power to make a difference. Join us!

Job Description:


This job is responsible for partnering with engineering and technology teams to implement measures as prescribed by lead/senior SRE engineers. Key responsibilities include ensuring appropriate instrumentation, tooling, ticketing, alerting and on call routines are in place for key services, identifying root causes of issues through production triage efforts, and suggesting code enhancements to technology teams to automate services and improve reliability and efficiency. Job expectations include using software development skills to improve efficiency and to address gaps in reliability.

Overview:

Site Reliability Engineer II (Hadoop Admin) role supporting NextGen Platforms built around Big Data Technologies (AI/ML, Hadoop, Jupyter Notebook, Spark, Kafka, Impala, Hbase, Docker-Container, Ansible and many more). Requires experience in cluster management of vendor based Hadoop and Data Science (AI/ML) products like C3, Cloudera, Talend, Trifacta, Selerity, ELK, KPMG Ignite etc. Analyst is involved in the full life cycle of an application and part of an agile development process. They require the ability to interact, develop, engineer, and communicate collaboratively at the highest technical levels with clients, development teams, vendors and other partners. The following section is intended to serve as a general guideline for each relative dimension of project complexity, responsibility, and education/experience within this role.

Responsibilities:

  • Develops and maintains reliability scripts, tools and libraries and leverages them for common instrumentation, automation, and operational needs, and when mentoring Site Reliability Engineer (SRE) resources on reliability practices and established tools/capabilities
  • Collaborates with Development and Infrastructure teams to understand technical solutions and implement monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead
  • Partners to implement code changes to make use of common reliability libraries and tools and helps Application Production Services and Application Development teammates understand how to use them
  • Identifies vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and defines solutions to reduce manual support effort and/or improve system reliability
  • Engages as a subject matter expert in major incident triage efforts and failure scenario modelling and diagnosis with Problem Manager root causes for major incident/problem management investigations
  • Participates regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio
  • Works on complex, major or highly visible tasks in support of multiple projects that require multiple areas of expertise
  • Team member will be expected to provide subject matter expertise in managing Hadoop and Data Science Platform operations with focus around Cloudera Hadoop, Jupyter Notebook, Openshift, Docker-Container Cluster Management and Administration
  • Integrates solutions with other applications and platforms outside the framework
  • He / She will be responsible for managing platform operations across all environments which includes upgrades, bug fixes, deployments, metrics / monitoring for resolution and forecasting, disaster recovery, incident / problem/ capacity management
  • Serves as a liaison between client partners and vendors in coordination with project managers to provide technical solutions that address user needs

Required Qualifications:

  • 5+ years of combined Technology experience in an Enterprise environment
  • Docker, OpenShift/Kubernetes, Database (SQL, Cassandra, Postgres), Jupyter Notebook
  • Strong technical knowledge: Unix/Linux; Database (Sybase/SQL/Oracle), Java, Python, Perl, Shell scripting, Infrastructure.
  • Experience in Monitoring & Alerting, and Job Scheduling Systems
  • Being comfortable with frequent, incremental code testing and deployment
  • Strong grasp of automation / DevOps tools – Ansible, Jenkins, SVN, Bitbucket

Desired Qualifications:

  • Bachelor’s degree or equivalent, preferably in a technical or engineering discipline
  • Cloudera Big Data Stack, Hadoop, Impala, Hive, Spark, Kafka, Impala, Hive, Hbase

Skills:

  • Analytical Thinking
  • Automation
  • Collaboration
  • Production Support
  • Result Orientation
  • Application Development
  • Architecture
  • Influence
  • Project Management
  • Solution Design
  • Adaptability
  • DevOps Practices
  • Risk Management
  • Solution Delivery Process
  • Stakeholder Management

Shift:

1st shift (United States of America)

Hours Per Week: 

40

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 307 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers