Infrastructure - DevOps & Site Reliability Engineer

Remote New York, NY
USD 130k - 170k
Elasticsearch Redis AWS Docker Python Bash Terraform Streaming Microservices Kubernetes
This job is closed! Check out or

Blackbird.AI helps organizations discover emergent threats and stay one step ahead of real-world harm through our AI-powered Narrative and Risk Intelligence Platform. Our commitment is to prioritize safety and security, providing the tools to identify potential risks and ensure a safer environment proactively. No matter the job or where it’s located, we’re all connected by a shared vision: To lead and enhance the landscape of risk intelligence.

Reporting to the Head of Engineering, you will own the infrastructure architecture for a real-time streaming cloud-hosted analytics platform and help Blackbird.AI establish a solid foundation for the deployment of different microservices, databases, and frameworks, along with performance monitoring tools, continuous integration, and deployment pipelines.

The DevOps & Site Reliability Engineer will be the driving force behind the architecture of our cutting-edge platform. You’ll work on Linux servers, ensuring their optimal performance. Your expertise in Kubernetes, Docker, and scripting languages will be instrumental in engineering fault tolerance and crafting efficient deployment scripts. Crucially, in this role, you will maintain and optimize essential components like Elasticsearch and Prometheus, all while upholding rigorous security standards.

As the DevOps & Site Reliability Engineer, you’ll have the chance to:

  • Manage both self-hosted AWS-hosted Linux servers.
  • Demonstrate proactive troubleshooting and clear communication during server infrastructure deployments.
  • Create and maintain Kubernetes clusters that house multiple databases and ETL processes.
  • Engineer fault tolerance mechanisms, backup procedures, and data retention policies.
  • Develop deployment and rollout scripts for seamless processes.
  • Monitor and scale various databases to optimize performance.
  • Develop monitoring and telemetry as needed to ensure comprehensive system observability and alert generation.
  • Maintain diverse web applications through web servers and ingresses.
  • Scale and manage multiple deployments, including ElasticSearch, PostgresDB, Redis, and more.
  • Support “security by design” to meet infosec objectives and conduct security audits and oversee security-related aspects like TLS and firewalls.
  • Automate deployments that are cloud-agnostic and adaptable to AWS or on-premise environments.
  • Collaborate with the data engineering and full-stack development teams to uphold best practices in stack selection and deployment.

What you’ll bring:

  • Bachelor’s degree in Computer Science or equivalent.
  • Proven track record of successfully deploying products in the cloud and SaaS model, emphasizing horizontal scalability and distribution.
  • Expert-level proficiency in Linux systems.
  • Mastery of the Kubernetes ecosystem, including Helm, and Docker containers.
  • Proficiency in Python.
  • Strong knowledge of web servers and security-related concepts.
  • Demonstrated experience in building and maintaining Prometheus, Grafana, and establishing infrastructure monitoring.
  • Good familiarity with ElasticSearch and Loki for log monitoring.
  • Solid background in addressing infrastructure security concerns.
  • 2+ years of hands-on experience in developing with Python and Bash.
  • Familiarity with infrastructure as code tools, such as Terraform or equivalent.
  • Experience in managing secret stores, including Vault or similar solutions.
  • Expertise in build automation, continuous integration, and continuous deployment tools.
  • Experience working with cloud-based services (similar to AWS S3, CloudFront, Route53, and ElastiCache).
  • Proven track record of collaborating with distributed teams.

Helpful to have:

  • Experience with MLOps frameworks like Kubeflow, Seldon, or similar.
  • Proficiency in building Kubernetes operators using Goland.
  • Technical background or experience in AI/ML deployments.
  • Experience with multi-tenant deployments in AWS or similar environments.
  • Experience with obtaining certifications such as SOC2 and FedRAMP.
  • Familiarity with mainstream ETL tools, such as Airflow or equivalent.
  • Experience handling massive datasets on the order of terabytes.

We’ve outlined specific skills, experience, and requirements for this position, but don’t stress if you don’t meet every single one. Our Talent Team is dedicated to discovering exceptional individuals, and they might identify a relevant aspect of your background that suits this role or another opportunity within Blackbird.AI.

If you have passion for the role, please still apply.

What’s in it for you:

Blackbird.AI is embarking on an exciting growth journey with numerous opportunities for career development within the company. You will join a nurturing, inclusive, and experienced team.

Join us as we soar to new heights!


At Blackbird.AI, our core values shape how we work and make decisions. Our values inspire us to be authentic and continue improving.

We embrace a strong sense of responsibility to society, recognizing the vital role our services play in empowering governments, communities, and individuals to foster critical thinking and empowerment. We believe in integrating personal and professional lives with societal needs, emphasizing the importance of creating an environment that attracts top talent and provides substantial growth opportunities. We are motivated by the potential of science and technology to impact humanity positively.

Artificial Intelligence Enterprise Software Homeland Security Journalism Machine Learning National Security Natural Language Processing Predictive Analytics Public Safety SaaS

3 applies


There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

50,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 264 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

Cancel anytime / Money-back guarantee

Wall of love from fellow engineers