Oomnitza

AI & Machine Learning Site Reliability Engineer

Remote Galway, Ireland
Hadoop TensorFlow Docker Python Machine Learning AWS GCP Azure Spark PyTorch Kubernetes Bash
Description
Oomnitza offers the industry’s most versatile Enterprise Technology Management platform that orchestrates and automates key business processes for IT. Our SaaS solution, with agentless integrations, best practices and low-code workflows, enables enterprises to leverage their existing infrastructure systems and automate processes such as offboarding, onboarding, audit readiness, refresh forecasting and more, thereby reducing reliance on error-prone manual tasks and tickets. We help some of the most well-known and innovative companies to improve efficiency, expedite audits, mitigate cyber risk and eliminate redundant IT spend. 

Team Oomnitza are seeking an experienced AI & ML Site Reliability Engineer who is passionate about AI, machine learning, and data science to support our innovations in AI and Data product management. In this role, you will be responsible for architecting and maintaining infrastructure that supports machine learning (ML), artificial intelligence (AI), and data-driven solutions. You will help stand up the foundational systems that enable large-scale AI deployment, including developing and managing Oomnitza’s big data analytics platform, developing AI architecture, implementing vector databases, building knowledge graphs, and optimizing systems for ML model deployment and inference.You will collaborate closely with data scientists, infrastructure engineers, product management teams, and UX designers to ensure our customers realize meaningful business value by streamlining workflows, ensure scalability, and manage the complete lifecycle of AI systems from development to production.

Responsibilities

  • Big Data Analytics Platform Build and maintain Oomnitza’s big data analytics platform that centralizes data from multiple customer instances and serves analytics and AI solutions
  • AI/ML Architecture & Infrastructure Development Design and build scalable, secure, and efficient AI infrastructure to support training and deploying machine learning models and AI software solutions.
  • Vector Databases & Knowledge Graphs Implement and manage vector databases for storing high-dimensional data and knowledge graphs to integrate structured and unstructured data.
  • Retrieval Augmented Generation (RAG) & GraphRAG Develop and integrate retrieval-augmented generation systems for more accurate, scalable, and context-aware models, including GraphRAG for advanced reasoning.
  • LLM Fine-Tuning, Transfer Learning & Optimization Work with data scientists to train and optimize and fine-tune large language models (LLMs) for specific business applications and ensure seamless integration with existing systems.
  • ML Model Deployment & Orchestration Deploy, manage, and monitor ML models in production, ensuring system reliability, scalability, and performance.
  • CI/CD for Machine Learning Pipelines Implement continuous integration and continuous deployment (CI/CD) processes tailored for machine learning, ensuring reproducibility and automation.
  • Agent Development & Automation Work with data scientists and the AI product management team todevelop and manage AI agents for task automation, process optimization, and adaptive learning systems.
  • Model Monitoring & Governance Ensure model performance monitoring, retraining, and governance protocols are in place for reliable and ethical AI usage.
  • Collaboration & Team Support Work closely with data scientists, ML engineers, and cross-functional teams to support development, testing, and deployment needs.

Qualifications

  • Education: Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field 
  • Experience: 5+ years of experience in site reliability engineering, dev ops, ML Ops or similar roleExperience with cloud platforms such as AWS, GCP, or Azure, including AI/ML services (e.g., SageMaker, Google Colab, Vertex AI).Proficient in deploying machine learning models such as regressions, decision trees, neural networks, recommendations systems, etc., into production and managing model lifecycle.
  • Technical Skills: Experience with data processing tools such as Apache Spark, Hadoop, or Airflow for large-scale data processing.Experience with AI/ML tools and frameworks (e.g., TensorFlow, PyTorch, LangChain, Hugging Face).Strong understanding of vector databases (e.g., Pinecone, Milvus, Chroma) and knowledge graph tools (e.g., Neo4j, RDF).Experience with RAG (Retrieval-Augmented Generation) techniques and GraphRAG systems.Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes).Proficiency in programming languages such as Python, Bash, and experience with ML tools and libraries.Experience implementing CI/CD for ML pipelines and working with ML version control systems (e.g., DVC, MLflow).Experience in on-call incident response in high-uptime environments
  • Behavioural Skills: Intellectual curiosity with a hunger to know how things work and question established ideas, concepts and frameworks
  • Spirit of service: with a “how can I serve” attitude that is centered around delivering value to the greater team, the overall company, and for our broader community of customers
  • Ability to embrace ambiguity: and apply structured structured thinking and  problem-solving skills
  • Entrepreneurial spirit with an enthusiasm to take on new challenges
  • Excellent communication and collaboration skills

Additional (Preferred) Qualifications

What We Can Offer You

  • Healthcare for dependents and spouse 
  • A progressive, healthy work culture with excellent opportunities for professional and personal development.  
  • Top performers will have an opportunity to help shape the team. Working directly with the founders to drive initiatives and create a structure that scales.
  • A once-in-a-lifetime career opportunity to get onboard a fast-growing business that is venture-backed by C5 Capital, Shasta Ventures, Riverside Acceleration Capital, and Hummer Winblad

Our Benefits Package

  • Dental & Vision Insurance 
  • Employee equity plan
  • Health Insurance for your spouse and dependents 
  • Pension, Life insurance and Income protection
  • Remote working & flexible work schedules Working from home equipment allowance
  • Choice of preferred equipment, Mac or PC.
  • Regular, fun social events and  workshops.


Oomnitza recruits, employs, trains, compensates and promotes regardless of race, religion, color, national origin, sex, disability, age, veteran status, and other protected status as required by applicable law.
Oomnitza
Oomnitza
Enterprise Software Information Technology SaaS Software

0 applies

7 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say