Qualys

Site Reliability Engineer

Pune, India
Python API Ansible Redis Cassandra Bash SQL Elasticsearch Kubernetes Java Terraform Chef Go Kafka Oracle Puppet
Description

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Site Reliability Engineer, Cloud Platform

Description
Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation, and improvement by applying scientific principles. 


•   Increase the effectiveness, reliability, and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
•   Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process.
•   Ensure that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance and system health.
Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need.
•   Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies
Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies.
•   Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame.
•   Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis


Requirements
 
•   4+ years of relevant experience in running distributed systems at scale in production.
•   Expertise in one of the programming language: Java, Python or Go.
•   Understanding of Restful API and UI
•   Proficient in writing bash scripts
•   Good understanding of SQL and NoSQL systems and SQL plans.
•   Good understanding of Elasticsearch schema and queries.
•   Good understanding with container and orchestration technologies Kubernetes etc
•   Experience with managing large scale deployments of message-oriented middleware such as Kafka 
•   Experience with redis caching.
•   Good understanding of systems programming (network stack, file system, OS services)
•   Understanding of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc
•   Skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
•   Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, thread dump analysis etc.
•   Knowledge of best practices related to security, performance, high-availability, and disaster recovery.
•   Demonstrate a proven record of handling production issues, planning escalation procedures, conducting post-mortems, impact analysis, risk assessments and other related procedures.
•   Able to drive results and set priorities independently.
•   BS/MS degree in Computer Science, Applied Math or related field


Bonus Points if you have:

•   Experience with managing large scale deployments of RDBMS systems such as oracle
•   Experience with managing large scale deployments of NoSQL databases such as Cassandra
•   Experience with monitoring tools such as Graphite, Grafana and Prometheus
•   Experience with Hashicorp technologies such as Consul, Vault, Terraform and Vagrant
•   Functional automation frameworks
•   Root cause analysis 
•   Experience with configuration management tools such as Chef, Puppet or Ansible
•   In-depth experience with continuous integration and continuous deployment pipelines
•   Exposure to Maven, Ant or Gradle for builds 
•   Regular expression and Data structure knowledge

Qualys
Qualys
Business Process Automation (BPA) Compliance Security Software

0 applies

53 views

Similar Jobs

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 264 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers