Samsung Electronics

Senior Chief Engineer SRE

Bengaluru, India Phoenix, AZ
GCP MySQL AWS Kubernetes Shell Terraform Ansible Python
Search for More Jobs Talk to a recruiter now 💪
Description

Position Summary

Site Reliability Engineer .
Site reliability engineers will be dedicated full-time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents and usually taking on-call responsibilities. Operate system efficiently and systematically through continuous monitoring and improvement, system/service operation automation and process application.

Building software to help operations and support teams
SRE teams are in charge of proactively building and implementing services to make IT and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident management.

Role and Responsibilities

Site Reliability Engineer . 
Site reliability engineers will be dedicated full-time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents and usually taking on-call responsibilities. Operate system efficiently and systematically through continuous  monitoring and improvement, system/service operation automation and process application.

Building software to help operations and support teams
SRE teams are in charge of proactively building and implementing services to make IT and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident management.

Fixing support escalation issues
Similarly to the point above, a site reliability engineer can expect to spend time fixing support escalation cases. But, as your SRE operations mature, your systems will become more reliable and you’ll see fewer critical incidents in production – leading to fewer support escalations. Because an SRE team touches so many different parts of the engineering and IT organization, they can be a great source of knowledge and can be helpful for routing issues to the right people and teams.

Optimizing on-call rotations and processes
More times than not, site reliability engineers will need to take on-call responsibilities. At most organizations, the SRE role will have a lot of say in how the team can improve system reliability through the optimization of on-call processes. SRE teams will help add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, site reliability engineers can update runbooks, tools and documentation to help prepare on-call teams for future incidents.

Documenting “tribal” knowledge
SRE teams gain exposure to systems in both staging and production, as well as all technical teams. They take part in work with software development, support, IT operations and on-call duties – meaning they build up a great amount of historical knowledge over time. Instead of siloing this knowledge into the mind of one team or one person, site reliability engineers can be tasked with documenting much of what they know. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it.

Conducting post-incident reviews
Without thorough post-incident reviews, you have no way to identify what’s working and what’s not. SRE teams need to keep teams honest and ensure that everyone – software developers and IT professionals – are conducting post-incident reviews, documenting their findings and taking action on their learnings. Then, site reliability engineers are often tasked with action items for building or optimizing some part of the SDLC or incident lifecycle to bolster the reliability of their service.

Skills and Qualifications


Primary Skill sets, 5-10 years
• Public Cloud - AWS, Kubernetes

• Scripting- Shell, Terraform, Ansible, Python, Jenkins, Spinnaker, CI/CD

• Knowledge and understanding of install, configure and manage  the public cloud  infrastructure on AWS, GCP using Terraform and ansible

• Operate system efficiently and systematically through continuous monitoring improvement, system/service operation automation and process application.

• Experienced professional with full understanding on specialized areas; resolves a wide range of issues in creative ways

• Works on problems of diverse scope where analyzing data requires evaluating identifiable factors. Demonstrates good judgement in selecting methods and techniques for obtaining solutions

• Normally receives little instruction on day-to-day work and receives general instructions on new assignments

• Perform to monitor server application and infrastructure for 24 hours every day and handle faults.

• Perform system operation automation of service for cost-effectiveness.

• Typically requires minimum 10 years' of related experience and a Bachelor's degree, or 3 years and a Master's degree; 

• Good English command proficiency

Secondary -  Monitoring using Grafana, Prometheus, Influx DB, TSDB(2-4 years)
Desired: Mysql, Nosql, Time series DB

* Please visit Samsung membership to see Privacy Policy, which defaults according to your location. You can change Country/Language at the bottom of the page. If you are European Economic Resident, please click here.

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 320 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers