- Design and Build Data Architecture: Architect and implement Allocate's data lakehouse on AWS – combining data lake storage and warehouse technologies to store diverse financial datasets. Develop a knowledge graph to model key relationships (investors, funds, companies, etc.) and integrate a vector database for storing embeddings to enable semantic search and retrieval for our AI agents across models and providers
- Develop Data Pipelines: Create robust ETL/ELT pipelines to ingest, clean, and transform data from various sources (internal application data and third-party APIs). Ensure both batch processing and real-time data streaming are handled to support up-to-date analytics and recommendations. Build pipelines with an eye on scalability (able to handle increasing data volume and complexity) and reliability (proper error handling and monitoring).
- Enable AI/ML Capabilities: Work closely with our data science and engineering team to provision the data and infrastructure needed for machine learning models and AI features. This includes preparing training datasets, setting up feature stores, and orchestrating workflows that feed LLM-based agents with the context they need (e.g. retrieving relevant data via vector similarity search). You will also implement systems to serve AI model outputs (such as recommendations) back into the product in real time.
- Technical Leadership & Collaboration: Serve as the subject matter expert for data engineering and AI infrastructure within Allocate. Provide architectural guidance and best practices to engineers who consume data in their services. Work in cross-functional squads to incorporate data-driven features into the product roadmap. As we grow, help mentor junior engineers and potentially lead a small "AI Data" team, setting coding standards and fostering a culture of data excellence.
- Infrastructure & DevOps: Collaborate with our DevOps engineers to deploy and maintain data services. Containerize and orchestrate data tools (using Docker/Kubernetes on AWS EKS) for production use. Implement CI/CD pipelines for data workflows, so that changes to data processing or models are tested and deployed automatically. Monitor the health and performance of our data platforms (setting up alerts, dashboards) and be ready to troubleshoot and resolve issues in a production environment to ensure uptime of critical data and AI services
- Continuous Improvement: Stay up-to-date with the latest in data engineering and AI (from new AWS offerings to open-source ML tools). Evaluate and recommend new technologies – for example, assessing if a stream processing platform like Kafka/Kinesis or an orchestration tool like Airflow could improve pipeline reliability. Challenge conventions and innovate: we encourage rethinking how things are done as we push to build a world-class, intelligent platform.
- Extensive Data Engineering Experience: 5+ years of hands-on experience in data engineering (or related fields), including designing and building large-scale data pipelines and storage solutions. You should have taken projects through the full lifecycle from architecture design to production deployment.
- Cloud Proficiency (AWS): Strong experience working with AWS cloud services for data. You should be comfortable with tools like S3, EC2, ECS, EKS, Athena, Redshift, Glue, and Step Functions. Experience setting up infrastructure-as-code (Terraform/CloudFormation) for these services is a plus.
- Database and Data Modeling Skills: Proficiency in SQL and relational database design. Able to design efficient schemas and optimize queries/indexes for performance. Experience building or working with data warehouses or lakehouses (e.g. Snowflake, Databricks Delta Lake) is highly desired. Familiarity with graph databases (Neo4j, AWS Neptune, etc.) and designing knowledge graph schemas will help you hit the ground running
- Programming Expertise: Fluency in at least one major programming language used in data engineering. Python is commonly used for data pipelines, and pandas/PySpark experience is valuable. However, we highly value experience with TypeScript/Node.js in data contexts as well, since our stack leans towards modern web technologies. The ideal candidate can work across languages – for example, writing a data API in C# or Node.js to interface with our backend, while also crafting Python scripts for data processing. Clean, maintainable code and adherence to best practices are a must.
- AI/ML Familiarity: While this is not a pure ML researcher role, you should understand how machine learning models consume data. Experience preparing datasets for training, working with feature stores, or integrating ML model outputs into applications is important. Knowledge of vector embeddings and experience with vector databases (Postgres pgVector, chroma, Pinecone, etc) will be a big plus, as our AI features rely on semantic search. Likewise, familiarity with frameworks for building AI agents or retrieval-augmented generation (e.g., LangChain, LlamaIndex) is important.
- DevOps and DataOps Skills: Solid understanding of containerization and deployment. Experience using Docker to package data applications and Kubernetes (or AWS EKS) to run distributed jobs/services. You should be comfortable setting up CI/CD pipelines for automated testing and deployment of data pipelines or ML models. Experience with workflow managers (Airflow, Prefect, dbt, or similar) to orchestrate complex jobs is beneficial.
- Strong Analytical and Problem-Solving Skills: Ability to analyze complex data problems, debug pipeline issues, and optimize system performance. You should be detail-oriented when it comes to data correctness and have a knack for troubleshooting data discrepancies or bottlenecks in processing.
- Leadership and Collaboration: Excellent communication skills and a collaborative mindset. You will be working with a diverse fully-remote team, so you need to articulate ideas clearly and build consensus. Experience mentoring other engineers or leading technical projects is important – you should be ready to take initiative and guide the team on best practices in data engineering. A positive attitude towards continuous learning and improvement is essential, as we value growth mindset and adaptability.
- Bachelor’s degree in Computer Science, similar technical field of study, or equivalent practical experience
- Providing our clients with a world-class experience is our number one priority. We obsessively search for ways to improve the experience for our clients and partners. This requires extraordinary response times, proactivity, and ensuring that everything we do, from product strategy to offline communications is a top-tier client experience.
- Challenge convention: Instead of detailing all the reasons why an idea may not work, we constantly question things to determine how a viable idea may be put into motion.
- Commitment to continuous improvement: We find ways to personally scale each day by pushing ourselves up the learning curve.
- Meritocracy, not politics: We place the utmost value on results and rewards through merit, not reward actions driven by political agendas or behavior.
- Civil Discourse is embraced: We believe open, intellectually curious conversations are required to consistently arrive at the best decisions. Respect is paramount in our dealings with one another, but our mission is always to get the right answer collectively, not to be right.
- Embrace technological change: We adopt tools and techniques that make us faster, smarter, and better. We stay open to innovation, especially around AI and automation, and drop outdated methods without hesitation. Complacency kills progress, we value adaptability and curiosity.
- Fully Remote Position
- Travel required for offsites
- An in-person interview may be required during the interview process
- A Broadband internet connection required
- Seniority: Mid-level
- Location: All I-9 eligible candidates will be considered
- Salary: $160-200K base, competitive early-stage fintech startup package (salary + bonus + equity)
- Benefits: Medical, dental, and vision. 401(k), and responsible vacation time (RTO)
- Employment: Full-time
- Compliance with Allocate's Code of Ethics is a given for this role.
Other Jobs from Allocate
Managing Director / Senior Director, Investments & Research
Senior Software Engineer - Frontend (Remote)
Senior Software Engineer - Backend (Remote)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say