Teraswitch

Senior Infrastructure Engineer

Pittsburgh, PA
KVM Bash Python Ceph S3 NVMe TCP Ansible Kubernetes Prometheus Grafana OpenTelemetry Nginx HAProxy
Description

Senior Infrastructure Engineer (KVM Compute / Distributed Storage)

Department: Infrastructure Engineering

Location: Pittsburgh, PA or Hybrid

Employment Type: FullTime

Engineered to outperform, Teraswitch is on a mission to provide high-performance infrastructure services for critical workloads. With 20+ datacenter locations around the world interconnected by our low latency global backbone network, we are the class leader in performance bare metal hosting and rapidly expanding into additional infrastructure services.

The Job

The Infrastructure Engineering team at Teraswitch is responsible for the compute, storage, and platform infrastructure that powers our products and internal operations.

This senior/staff-level role is focused on building provider-grade hosted compute and storage services—specifically a KVM-based VM product and a distributed object (S3) and block storage product (NVMe/TCP). Qualified candidates will have depth in at least one of these areas. You will help architect and build cloud-scale, globally distributed products for a high-performance infrastructure provider, with an emphasis on automation, scalability, and security by design.

While this role has a compute and storage services focus, as a senior member of the Infrastructure Engineering team, you’ll also be expected to cross-train and contribute broadly across infrastructure domains as we grow the team.

What You’ll Do

  • Design and implement provider-scale, globally distributed hosted services - with a focus in either compute (KVM-based cloud), storage (distributed object and block services), or both

    • Compute track: Evaluate/design, implement, and manage a KVM-based cloud compute platform

    • Storage track: Evaluate, implement, and manage a distributed storage platform (Ceph, Weka, VAST, etc) that supports object (S3) and block (NVMe/TCP) protocols

  • Define provisioning workflows, node/fleet management, and scalable operations

  • Integrate service networking primitives (IPAM, DHCP, DNS) and customer interfaces to the product

  • Design multi-tenant provisioning and controls: isolation boundaries, quotas/limits, metering, and security

  • Build automation and tooling for global deployments of these products: upgrades, capacity expansion, failure handling, rebalancing

  • Implement robust observability for these products to enhance production service reliability (metrics, logs, traces; dashboards; actionable alerting)

  • Collaborate with the Software team to integrate these products with our customer control plane (portal, API) and billing systems, ensuring robust customer-driven lifecycle management

  • Cross-train with the rest of the Infrastructure Engineering team and contribute broadly to the compute, storage, and platform infrastructure that powers Teraswitch products and internal operations

Basic Qualifications

  • Strong Linux systems and networking expertise, production operations experience

  • Depth in at least one of the following:

    • Compute / virtualization: KVM/QEMU, libvirt and/or platforms such as Proxmox/OpenStack; image pipelines; fleet operations; multi-tenant considerations

    • Distributed storage services: experience with distributed storage platforms (Ceph, VAST, Weka, or similar) and/or managing block/object storage offerings; public/multi-tenant deployment experience is a plus

  • Automation - experience in scripting (Python, bash, etc) and/or configuration management (Ansible or similar)

  • Experience with observability/monitoring systems (metrics, logs, traces, alerting) and using them to enhance production service reliability

  • Comfortable working in a fast-paced, results-oriented environment

  • Committed to operational best practices and security by design

Preferred Skills/Experience

You do not need all of these—depth in a few areas plus strong fundamentals is sufficient:

  • Service / hosting provider experience (multi-tenant systems, automation-first operations, scalable and secure design)

  • Experience with VPS/KVM hosting at scale, including networking and security

  • Experience with distributed storage systems such as Ceph, Weka, or VAST, particularly in a service provider environment

  • Expertise in object storage / S3 services - gateway/front-door patterns (F5/Nginx/HAProxy), networking, durability, security

  • Strong networking fundamentals relevant to provider environments (routing/segmentation, IPAM/DHCP/DNS integration)

  • Cloud-native observability/monitoring (e.g. Prometheus, Grafana, OpenTelemetry)

  • Kubernetes and cloud-native (CNCF) ecosystem experience

  • Demonstrated ability to design and operate automation-first infrastructure at scale

  • Experience in other Infrastructure team domains - e.g. self-hosted Kubernetes deployment / management, and/or bare metal automation and fleet management

On-Call / Operations

Participate in an on-call system supporting critical production systems.

Location

Preference given to full-time onsite candidates in Pittsburgh, PA, followed by hybrid candidates.

Compensation and Benefits

Along with a competitive pay scale, full-time Teraswitch employees are eligible for the following benefits:

  • Health, Dental, and Vision Insurance

  • 401(k) with company profit sharing

  • PTO and 11 Company Paid Holidays

Teraswitch
Teraswitch

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say