Description
- Lead a team of SREs responsible for Kubernetes infrastructure to achieve their personal and shared goals, thrive in their roles, as well as spearheading the engineering efforts by your own example.
- Ensure maximum availability, reliability, and scalability of our multi-datacenter hybrid Linux environments (10 clusters, 1000+ nodes, 20k+ pods, 1mil+ qps).
- Performance and resilience testing. This may include reviewing configuration, software choices/versions, hardware specs, etc.
- Advance our technology stack with innovative ideas and new creative solutions.
- Participate in capacity management of core systems and services, application analysis, performance, and security tuning.
- Create strategies for long-term permanent fixes to critical production incidents.
- Maintain documentation, build tooling, and create alerts to both identify and address infrastructure reliability.
- Proactively identify system anomalies.
- Conducting post-mortems and communicating impact and remediation strategies with service owners and C-level staff.
- East Coast U.S. hours 9am-6pm EST preferred
- Also open to PST hours
- Minimum 10 years of relevant experience
- Minimum 3 years of management experience
- Thorough understanding of Linux (we use CentOS/Rocky).
- Advanced knowledge of K8s and its ecosystem:
- Extensive hands-on experience deploying and managing Kubernetes clusters on bare metal servers in production environments.
- Immaculate knowledge of best practices for architecting cross-datacenter Kubernetes clusters running on-premise with automated management using kubeadm.
- Bare metal server configuration optimization know-how for Kubernetes workloads, including networking, storage and security considerations.
- Kubelet and CRI tuning in accordance with best practices, including but not limited to NUMA and GPU optimization.
- Deep knowledge of Kubernetes internals, including etcd, container network interfaces and container runtimes.
- Thorough understanding of PKI certificates for all components (ability to manually troubleshoot and solve client, server, and control-plane certificate issues within Kubernetes with zero downtime).
- Vast experience in the development of custom Kubernetes operators and autoscalers, as well as tailored ingress/egress controllers, custom resource definitions.
- Fluency in GitOps automation tools (Flux v1/v2), comprehensive knowledge of Helm and Kustomize controllers.
- Ability to manage BGP configuration, mastery in kube-router and GoBGP, as well as MetalLB.
- Understanding of the most intricate details in rook/ceph implementation for Kubernetes.
- Profound knowledge of docker (docker-shim), containerd and runc internals at the kernel level.
- Deep understanding of Puppet configuration management toolset (experience with Chef, CFEngine or Salt also works).
- Experience administering NoSQL databases (Redis, ES).
- Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, ELK.
- Strong scripting and automation skills using languages like Python, Ruby, Java, or Go.
- Advanced understanding of networking concepts (TCP/IP stack, BGP, DNS, CDN, load balancing).
- Close familiarity with Cassandra at scale.
- Understanding of Kafka architecture.
- Experience in AdTech or High-Frequency Trading is a plus.
- Experience with Security-related best practices is a plus.
Benefits:
- Comprehensive healthcare with medical, dental, and vision options, and 100%-paid life & disability insurance
- 401(k) Match
- Generous paid vacation and sick time
- Paid parental leave & adoption assistance
- Annual tuition assistance
- Better Yourself Wellness program
- Group volunteer opportunities and fun events
- A referral bonus program -- we love hiring referrals here at PulsePoint
0 applies
8 views
Other Jobs from PulsePoint
(Remote, INDIA ONLY) QA Engineer, Data Products
Similar Jobs
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say