Akaike Technologies

Senior Data Scientist

Bengaluru, India Chennai
Data Science LLM Machine Learning Generative AI PySpark SQL Databricks Python PyTorch TensorFlow AWS Lambda FastAPI LangChain LlamaIndex DSPy RAG vLLM Neo4j
Description

Senior Data Scientist

Location: Bengaluru, India

Department: Projects & Delivery

Experience: 4-5 years

Skills: Data Science, LLM, Machine Learning

Senior Data Scientist

Experience: 4+ Years
Location: Bengaluru, Chennai (Hybrid)
Team: Data Science & AI

About Akaike Technologies

At Akaike Technologies, we are redefining the boundaries of enterprise intelligence. We are seeking a highly specialized Senior Data Scientist who thrives at the intersection of Generative AI and Classical Machine Learning.
This role is designed for a practitioner who does not just "call APIs" but understands the mathematics behind Transformers and can architect complex, high-accuracy Agentic systems. You will spend roughly 60% of your time on Generative AI (Agents, RAG, SQL-Gen) and 40% on robust Classical ML/Deep Learning (Forecasting, Classification, Custom Architectures), all backed by a strong PySpark data foundation.

Key Responsibilities

1. Generative AI & Agentic Systems (60% Focus)

SQL-Based Agent Architecting: Design and deploy highly accurate Text-to-SQL agents that can query complex enterprise databases with precision. Focus on schema linking, error handling, and self-correction mechanisms.
Multi-Agent Systems: Build sophisticated Agentic Workflows using patterns like ReACT and Agent-Critique. Orchestrate systems where agents collaborate (using frameworks like LangGraph or CrewAI) to critique and improve each other's outputs before final execution.
RAG & Long-Context Optimization: Develop production-grade Retrieval Augmented Generation (RAG) systems. Optimize chunking strategies, vector search (Pinecone/Milvus/Weaviate), and re-ranking algorithms to minimize hallucinations.
LLM Evaluation & Fine-Tuning: Move beyond basic prompting. Implement LLM-as-a-judge evaluation frameworks to quantitatively measure agent accuracy. Perform Parameter-Efficient Fine-Tuning (PEFT/LoRA) on open-source models (Llama 3, Mistral) for domain-specific tasks.

2. Classical ML, Deep Learning & Transformer Governance (40% Focus)

Transformer Internals: Demonstrate deep governance over Transformer architectures. Go beyond pre-trained models to design custom loss functions or modify attention mechanisms to address specific data nuances.
Custom Business Modeling: Build bespoke predictive models for complex business scenarios such as Targeting, Budget Optimization, and Churn, where off-the-shelf solutions fail.
Advanced Deep Learning: Utilize 1D/2D CNNs, LSTMs, and Representation Learning for complex pattern recognition in non-text data (time-series, behavioral logs).
Sparsity & Nuance: Handle real-world data challenges, including PU learning (Positive-Unlabeled), single-class learning.

3. Data Science at Scale (PySpark & Databricks)

Billion-Scale Processing: You are not reliant on Data Engineers for every table. You must comfortably write optimized PySpark/SparkSQL jobs on Databricks to process billions of rows for training data creation.
Feature Engineering: Build complex feature stores in a distributed environment, ensuring consistency between training and inference.

4. Architecture, MLOps & Lifecycle Management

System Architecture Design: Architect end-to-end ML systems, making critical trade-off decisions between latency, cost, and accuracy. Design modular components for reusability and scalability across the organization.
A/B Testing & Measurement: Design and execute rigorous A/B tests (or Interleaved testing) to validate model impact in production. Define clear success metrics (offline proxies vs. online business KPIs) and ensure statistical significance of results.
Continuous Improvement (CI/CD/CT): Establish feedback loops for model monitoring. Detect data drift and concept drift, and implement automated retraining strategies to ensure models improve continuously over time.
Serverless Pipelines: Design scalable deployment pipelines utilizing AWS Lambda, Step Functions, and FastAPI for event-driven and real-time inference.

5. Leadership & Stakeholder Management

Strategic Problem Formulation: Proactively identify opportunities to leverage data science by analyzing product roadmaps and market scenarios. Translate abstract business goals (e.g., "maximize user engagement" or "reduce market spend") into concrete, solvable mathematical problems.
Technical Mentorship: Actively mentor junior data scientists. Conduct rigorous code reviews, enforce design patterns, and foster a culture of engineering excellence within the team.
Client Handling: Serve as the primary technical point of contact for clients. Explain model limitations transparently to non-technical stakeholders and manage expectations regarding AI capabilities.

Must-Have Skills

Core Technical Stack:
GenAI Frameworks: Advanced proficiency with LangChain, LlamaIndex, or DSPy. Experience building agents that interact with SQL databases is critical.
Deep Learning: PyTorch or TensorFlow. Deep understanding of Attention mechanisms, Encoder-Decoder architectures, and Embeddings.
Big Data: Expert-level PySpark and SQL. Ability to debug Spark jobs and optimize partitions/shuffles on Databricks.
Programming: Python (OOP, typing, rigorous code standards).

Experience & Soft Skills:
Proven track record of deploying at least one Agentic System or complex RAG pipeline to production.
Experience treating SQL as a first-class citizen in GenAI workflows (Text-to-SQL).
Experimental Mindset: Strong grasp of statistical testing, experimental design, and metrics evaluation (A/B testing).
Strong Communication: Ability to articulate complex technical concepts to business leaders without oversimplifying the risks.

Nice to Have

Experience with vLLM or TGI for serving open-source models.
Knowledge of Knowledge Graphs (Neo4j) combined with LLMs (GraphRAG).
Publications or active contributions to the Open Source AI community.

Benefits & Perks

Competitive Compensation & ESOPs.
Budget for compute (GPUs) for experimentation.
Sponsorship for top-tier AI conferences (NeurIPS, ICML, etc.).
A culture that values "Science" in Data Science—we encourage reading papers and trying novel architecture.

Akaike Technologies
Akaike Technologies

0 applies

0 views

There are more than 50,000 engineering jobs:

Subscribe to membership and unlock all jobs

Engineering Jobs

60,000+ jobs from 4,500+ well-funded companies

Updated Daily

New jobs are added every day as companies post them

Refined Search

Use filters like skill, location, etc to narrow results

Become a member

🥳🥳🥳 452 happy customers and counting...

Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.

To try it out

For active job seekers

For those who are passive looking

Cancel anytime

Frequently Asked Questions

  • We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
  • We've got over 200,000 jobs from 15,000+ vetted companies. No fake or sleazy jobs here!
  • We aggregate jobs from 15,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
  • We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
  • Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
  • Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
  • Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅

What Fellow Engineers Say