Company Description
Run:AI is bridging the gap between data science and computing infrastructure by creating a high-performance compute virtualization layer for deep learning, speeding the training of neural network models and enabling the development of large AI models. By abstracting workloads from underlying infrastructure, Run:AI creates a shared pool of resources that can be dynamically provisioned for full utilization of expensive GPU compute.
Job Description
The Run:AI product is a mixture of SaaS and on-prem deployments on top of Kubernetes. The DevOps Engineer will be responsible for the design, build and health of these deployments.
You will work with technologies and tools such as Kubernetes custom operators and controllers, admission controllers and webhooks, Helm, GitHub, Github actions, ArgoCD and Gitops.
In your day-to-day work you will interact with the Engineering teams, Customer Success, Professional Services, pre-sales as well as the IT departments of our enterprise customers.
Responsibilities:
- Full end to end ownership over our entire cloud infrastructure, including individual development environments, Build/CI server, and production systems on various cloud environments.
- Design, build, and shape the architecture of deployments of Run:AI cloud-native products over a wide range of complex customer environments (on-premise, cloud, edge), constraints (e.g air-gapped installation variant), and K8s flavors (vanilla, cloud-managed, Openshift, Rancher, Tanzu, and more).
- Troubleshoot production issues and tackle performance challenges.
- Collaborate with stakeholders to offer input on product direction and design.
- Continually evaluate tools and technologies to improve the overall release and product deployment processes.
Qualifications
- 3+ years of work experience as a DevOps
- hands-on technical leadership in a large scale software development environment
- Key qualification: expert in Kubernetes -2+ years of hands-on experience with vanilla kubernetes.
- Proficiency in Linux, Networking, Storage and Security.
- Vast experience in managing a production environment, including monitoring and logging solutions.
- Excellent Bash/Shell scripting skills -AND- scripting using Python, Go.
- Strong software engineering skills in backend systems and databases.
Similar Jobs
Site Reliability Engineer - DevOps - full remote
Senior Software Engineer- Ruby on Rails
Software Engineer
Senior Cloud Engineer | Observability
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
π₯³π₯³π₯³ 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineersβ¦ in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. π οΈ
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. π
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. π―
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. π
What Fellow Engineers Say