Tabby

Senior Machine Learning/Data Operations Engineer, Infrastructure (Remote)

Remote Belgrade
Docker Kubernetes Kafka Microservices Terraform API PostgreSQL Ansible GCP Go
Description

Senior ML/Data Ops Engineer II

Department: Infrastructure

Employment Type: Full Time

Location: Belgrade, /, Remote, Remote


Tabby creates financial freedom in the way people shop, earn and save by reshaping their relationship with money. Over 15 million users choose Tabby to stay in control of their spending and make the most out of their money.

The company’s flagship offering allows shoppers to split their payments online and in-store with no interest or fees. Over 40,000 global brands and small businesses, including Amazon, Noon, IKEA, and SHEIN use Tabby to accelerate growth and gain loyal customers by offering easy and flexible payments online and in stores.

Tabby generates over $10 billion in annual transaction volume for its partner brands and is the highest-rated, most-reviewed, largest, and fastest-growing FinTech in the GCC region.

Tabby launched in 2019 and has since raised +$1 billion in equity and debt funding from global and regional investors, and is now valued at $4.5 billion.


Key Skills and Responsibilities

LLM Serving & Model Management:
  • Deep expertise in high-throughput serving using vLLM, NVIDIA TensorRT-LLM, and sglang to minimize latency and maximize hardware efficiency.
  • Hands-on experience deploying and optimizing large-scale open-weights models, specifically DeepSeek 3.1/3.2, Qwen, and GPT-OSS variants.
  • Advanced optimization and security hardening of Docker specifically for GPU environments.
  • Managing model weights and orchestration within Kubernetes (GKE) environments.
  • Real-Time Data Engineering & CDC:
  • Designing and maintaining high-throughput CDC (Change Data Capture) pipelines using the Apache ecosystem (e.g., Debezium, Kafka) to sync data from Cloud PostgreSQL.
  • Deploying and tuning ClickHouse for real-time analytics, ML feature storage, and high-speed logging.
  • Orchestrating complex ML data workflows using Airflow (Google Cloud Composer) to ensure data reliability. Must

Core Infrastructure & Networking:
  • Strong Linux systems expertise including internals, networking, and performance tuning for large-scale distributed systems.
  • Experience with Istio service mesh to manage microservices communication and traffic.
  • Provisioning and maintaining dedicated GPU nodes (A100/H100/H200/B200), including driver management and OS-level tuning using Ansible.
  • Solid Kubernetes expertise: controllers, CRDs, CNI, and Ingress.
  • CI/CD & Tooling:
  •  Implementing pipelines as code within GitLab CI, managing runners, caching, and security scanning.
  • Infrastructure as Code with Terraform and Terragrunt.
  • Proficiency in  Python/Bash for building custom automation and AI Agent tooling.
Load Testing & Observability:
  • Conducting rigorous load testing for GenAI applications, focusing on metrics like TTFT, TPS, and RPS.
  • Deploying and managing LiteLLM Gateway for unified API access, load balancing, and cost tracking.
  • Experience with Datadog for monitoring GPU utilization, inference health, and log pipelines.

Soft Skills:
  • Strong ownership mindset: balancing speed, reliability, and cost.
  • Comfortable working cross-functionally with developers, security, and compliance.
  • Excellent sense of responsibility and accountability.
  • English B2 or higher.
Nice to Have:
Experience with PCI-DSS, SOC2, or regulations compliance environments.

Our Tech Stack: Linux, Docker, Kubernetes, GCP (GKE, Cloud PostgreSQL), Datadog, GitLab, Apache CDC, ClickHouse, Airflow, Istio, Terraform, Terragrunt, Ansible, vLLM, TensorRT-LLM, sglang, LiteLLM, DeepSeek, Qwen, Go, Python

What we offer

  • Full-time B2B contract
  • Fully remote setup, work from anywhere in Europe
  • Up to 20% tax allowance
  • 22 paid leave days annually
  • Stock options (ESOP) in a fast-scaling, pre-IPO company
  • Flexi benefits you can use for wellness, travel, or learning
  • Work alongside a high-performing, international engineering team in a global fintech unicorn

Relocation support is available to our hubs in Armenia, Georgia, Serbia, and Spain, including flights, temporary accommodation, and legal setup.


Tabby
Tabby

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say