WeightsBiases

Senior Solutions Architect, Platform Infrastructure - PST (Remote)

Remote San Francisco, CA
USD 209k - 209k
GCP Python Deep Learning Kubernetes Terraform Azure Docker MySQL AWS
Description
At Weights & Biases, our mission is to build the best tools for AI developers. We founded our company on the insight that while there were excellent tools for developers to build better code, there were no similarly great tools to help ML practitioners build better models. Starting with our first experiment tracking product, we have since expanded our solution into a comprehensive AI developer platform for organizations focused on building their own deep learning models and generative AI applications.

Weights & Biases is a Series C company with $250M in funding and over 200 employees. We proudly serve over 1,000 customers and more than 30 foundation model builders including customers such as OpenAI, NVIDIA, Microsoft, and Toyota.

The Senior Solutions Architect role at Weights & Biases is a unique hybrid, blending the technical expertise of a Site Reliability Engineer (SRE) with the communication and advisory skills of a Solutions Architect. In this role, you will focus on all aspects of the Weights & Biases Platform, managing customer deployments across various cloud infrastructures and on-prem environments to ensure scalability, reliability, and operational excellence.

You will work closely with customers to debug issues, provide best practices, and help them unlock the full potential of Weights & Biases. Additionally, you will produce technical content such as blog posts, documentation updates, and internal enablement material to support the Field Engineering team. This role requires deep collaboration with Support, Product, and Engineering teams to drive product improvements based on customer insights.

Responsibilities:

  • Deployment & Operations:
  • Work with customer operations teams to provision Weights & Biases services in Dedicated Cloud, Private Cloud, and on-prem environments.
  • Manage complex infrastructure implementations, partnering with highly skilled customer engineers.
  • Monitor and ensure the reliability, performance, and scalability of customer deployments using SRE best practices.
  • Debugging & Troubleshooting:
  • Diagnose and resolve issues in customer environments, documenting resolutions to accelerate future problem-solving.
  • Provide hands-on support for containerized and distributed systems using Docker, Kubernetes, and related technologies.
  • Customer Engagement:
  • Lead technical discussions with customers, acting as a trusted advisor for infrastructure reliability and operational excellence.
  • Deliver training sessions, product demos, and workshops to help customers maximize the value of Weights & Biases.
  • Collaborate with customers to uncover desired outcomes and recommend solutions tailored to their needs.
  • Enablement & Collaboration:
  • Partner with AI Solution Engineers to streamline post-sales processes, including onboarding, adoption, and training.
  • Collaborate with Sales Engineering to ensure a seamless transition from POC to onboarding.
  • Provide insights to the Product team based on customer feedback to influence the product roadmap.

Requirements:

  • Based in the Pacific Standard Time (PST) timezone.
  • A proven track record of systematically diagnosing and resolving infrastructure issues.
  • Prior experience in a customer-facing technical role.
  • Expertise with Docker, Kubernetes, Helm charts, networking, and cloud-managed services (e.g., MySQL, Object Stores).
  • Strong fundamentals in Infrastructure as Code (IaC), preferably Terraform.
  • Proficiency with at least one cloud platform (AWS, GCP, Azure); experience with multiple platforms is a plus.
  • Strong Linux/Unix command line experience.
  • Basic proficiency in Python and familiarity with ML workflows or tools.
  • Exceptional communication skills, both written and verbal, with the ability to simplify complex topics for diverse audiences.
  • Proven ability to prioritize and manage multiple competing tasks in a dynamic environment.

Strong plus

  • Deep proficiency in Kubernetes design patterns, including Operators.
  • Familiarity with data engineering and MLOps tooling.
  • Experience as an educator or facilitator for technical training sessions, workshops, or demos.
  • SaaS, web service, or distributed systems operations experience.

Our Benefits:

  • 🏝️ Flexible time off
  • 🩺 Medical, Dental, and Vision for employees and Family Coverage
  • 🏠 Remote first culture with in-office flexibility in San Francisco
  • πŸ’΅ Home office budget with a new high-powered laptop
  • πŸ₯‡ Truly competitive salary and equity
  • 🚼 12 weeks of Parental leave (U.S. specific)
  • πŸ“ˆ 401(k) (U.S. specific)
  • Supplemental benefits may be available depending on your location
  • Explore benefits by country
We encourage you to apply even if your experience doesn't perfectly align with the job description as we seek out diverse and creative perspectives. Team members who love to learn and collaborate in an inclusive environment will flourish with us. We are an equal opportunity employer and do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status. If you need additional accommodations to feel comfortable during your interview process, reach out at careers@wandb.com.

#LI-Remote
WeightsBiases
WeightsBiases
Artificial Intelligence Data Visualization Developer Tools Machine Learning

0 applies

2 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

πŸ₯³πŸ₯³πŸ₯³ 401 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. πŸ› οΈ
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. πŸš€
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. πŸ“…

What Fellow Engineers Say