Unanet

Site Reliability Engineer

Remote US
USD 120k - 130k
Terraform API Oracle Go Python AWS Docker Kubernetes Ansible Chef Shell
This job is closed! Check out or
Description

As a member of our Cloud Engineering department, you will help us in our journey to transform into an enterprise SaaS company which hosts numerous top-tier customers on multiple complimentary products. With a quickly growing customer base, we need creative and dynamic engineers to help architect and engineer innovative solutions that ensure the best possible experience for our customers.

You will join a team of talented, fast-moving engineers involved in multiple aspects of the SaaS delivery and customer experience lifecycle. We are looking for an engineer with a strong SRE background, one who has experience engineering services to ensure that we are proactive, efficient, and effective at operating our products with high availability. Your success will hinge on your ability to apply software engineering methods to the various operational needs of our cloud-hosted workloads as well as a firm grasp of automation, cloud architectures, monitoring and observability, fault tolerance, and engineering for scale across a diverse set of products. You should be passionate about solving problems and developing creative solutions leveraging automation.

What You’ll Do

  • Provision, configure, and maintain the production environments to handle running several application stacks in the cloud which can scale to meet the needs of a fast-growing customer base as well as our internal product team.
  • Automate all operational aspects of deployment, configuring, and management of software components as well as cloud infrastructure.
  • Define and implement monitoring, alerting, and SLIs/SLOs for platform components, infrastructure, and customer-facing applications.
  • Work proactively to prevent customer-impacting issues.
  • Investigate performance or reliability issues and partner with product development to remediate root causes.
  • Report on availability, system health, and operational metrics/trends.
  • Implement strategies around disaster recovery and security for all sub-systems in infrastructure (e.g., web servers, database, queues, storage, network)
  • Contribute to strategic and tactical plans for continued improvement of cloud architecture and operations.
  • Perform capacity management, load, and scalability planning.
  • Help drive process improvements for service management, including outage/incident management, rollbacks, health checks and reporting.
  • Assist management in development and optimization of operational cost models.
  • Assist in the enhancement of 24x7 performance monitoring, reporting, and response protocols.
  • As a member of a team of Participate in a 24x7 on-call rotation.

Your First 90 Days

In your first 30 Days, as your familiarity with the product and pipeline grows, your responsibilities and influence will grow as well. You will immerse yourself in the daily operation of the production cloud environment, including provisioning new infrastructure, reviewing metrics and alerts, troubleshooting, and blameless incident postmortems. You will become familiar with our tech stack for each product as well as our management and observability tech stacks.

Within your first 60 Days, working with the rest of the Cloud Operations team, you will be responsible for identifying procedures currently handled manually or not fully automated. You will become familiar with existing automation and management services which can be enhanced. You will shadow other members of your team to gain understanding of each product’s unique architecture and management needs.

Within your first 90 Days, you will collaborate with our Director of Cloud Operations to define goals for the transition of Cloud Operations to a true SRE practice. Working with our Cloud Architecture team, you will identify the gaps between lower and upper delivery environments. You, along with the rest of our Cloud Operations team, will be responsible for supporting production environments.

About You

  • 3+ years of hands-on experience as a production SRE and/or DevOps Engineer focused on SRE areas
  • 3+ years of AWS DevOps experience, deploying, supporting, and managing applications
  • Experience with Docker, Kubernetes (EKS), managing and troubleshooting within an environment of 500+ containers and/or over 50+ namespaces
  • Extensive use of automation and configuration management tools such as Ansible or Chef with obsessive desire to automate
  • Prior experience with Terraform
  • Experience supporting and operating Oracle databases
  • Hands-on software development experience with applications and RESTful APIs architected for the public cloud
  • Performance optimization experience, including troubleshooting and resolving network and server latency issues, performing hardware evaluation/selection tasks, performance vs. cost vs. time analysis 
  • Proficiency with automation or scripting languages (e.g., GO, Python, Shell)
  • Working knowledge of Agile Development practices (e.g., SCRUM, TDD)
  • Detail-oriented, with excellent documentation skills, and ability to successfully manage multiple priorities
  • Excellent troubleshooting skills that range from diagnosing hardware/software issues to large scale failures within a complex infrastructure

Your Differentiators

  • Bachelor’s degree in computer science, computer information systems, or equivalent
  • Experience implementing production Docker/Kubernetes environments
  • Experience with multi-region H/A and DR strategies
  • Experience deploying and maintaining infrastructure in AWS
  • Experience with chaos engineering and failure mode analysis
  • Experience with multiple database platforms and services (Relational, Document, Key-Value, Analytics, etc.)
  • Experience with Splunk (or other log aggregation tools), Grafana, and Prometheus

Our Values

  • We are a Team. Employees, customers, and partners working together.
  • We are Customer-Focused. Customers are the heart of everything we do.
  • We are Driven. Seeking exceptional outcomes.
  • We Own our Success. Every employee has a stake in our company.
  • We do the right thing and have fun in the process

The salary range for this opportunity is $120,000 - $130,000 per year. You will be eligible for employee equity and discretionary bonus compensation, subject to plans that may be in effect from time to time. You will further be eligible to participate in Unanet’s employee benefits plans and programs. For more details on Unanet’s benefits offerings, please visit https://unanet.com/employee-benefits.  

Unanet is proud to be an Equal Opportunity Employer. Applicants will be considered for positions without regard to race, religion, sex, national origin, age, disability, veteran status or any other consideration made unlawful by applicable federal, state or local laws. 

Unanet
Unanet
Consulting Information Technology Internet Project Management Software

1 applies

108 views

Similar Jobs

Cloud Engineer II

Remote Poland

DevOps Engineer

Bengaluru, India

Senior Database Operations Engineer

Lisbon, Portugal Remote Hybrid

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 223 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers