Senior Engineer - Data/ML Platforms SRE
As a Senior Reliability Engineer, you will play a critical role in ensuring the robustness, availability, and performance of our cutting-edge Data Engineering and Machine Learning Platforms. You'll collaborate closely with cross-functional teams, including platforms developers, infrastructure experts, to enhance the reliability and resilience of our modern platforms. If you're passionate about pushing the boundaries of technology and thrive in a dynamic environment, this role is for you. You will help drive our insurance business transformation as we transition from a traditional IT model to a tech organization with engineering excellence as its mission, while co-creating the culture of psychological safety and continuous improvement.
Responsibilities :
Reliability Enhancements
- Design, develop, and implement software solutions that enhance the reliability and fault tolerance of our modern data and machine learning platforms
- Collaborate with software engineers to create robust, scalable, and efficient platforms
- Proactively identify and address potential reliability bottlenecks and performance issues
Automation and Monitoring :
- Develop and maintain automated processes for deployment, scaling, and maintenance of platforms
- Build effective monitoring systems to detect anomalies, performance degradation, and capacity issues
- Implement proactive measures to prevent incidents
Incident Response and Troubleshooting :
- Participate in on-call rotations to respond to incidents promptly
- Investigate and resolve storage-related incidents, ensuring minimal impact on services
- Conduct post-incident reviews to learn from incidents and improve system reliability
Automation and Monitoring :
- Develop and maintain automated processes for deployment, scaling, and maintenance of our platforms
- Build effective monitoring systems to detect anomalies, performance degradation, and capacity issues
- Implement proactive measures to prevent incidents
Capacity Planning and Scaling :
- Collaborate with infrastructure teams to plan for storage and compute capacity needs
- Scale storage systems efficiently to accommodate growing demands
- Optimize resource utilization while maintaining high availability
Documentation and Knowledge Sharing:
- Document processes, procedures, and best practices
- Share knowledge with colleagues to foster a culture of continuous improvement
- Mentor junior engineers
Qualifications :
- Bachelor’s degree in computer science, Information Systems, or equivalent education or work experience
- Minimum of 5 year of experience in Data Engineering pipeline related roles
- Experience in Big Data ecosystem : ETL, tooling of Big Data Platform (Apache Spark, Airflow), Datalake, Synapse or Snowflake
- Experience in Machine Learning ecosystem : training models, inference, experimentation, and pipelines infrastructure.
- Proficiency in modern on prem object storage technologies (CEPH, MinIO) and its cloud equivalents (AWS S3, Azure Blob Storage, Google Cloud Storage)
- Experience with infrastructure automation, tooling, and configuration management frameworks (e.g., Puppet, Chef, Ansible, Terraform, Pulumi, etc.)
- Fluency of SQL and no-SQL
- Knowledge of CS data structures and algorithms.
- Fluency and Specialization with at least two modern languages such as Java, Python or Go, including object-oriented design.
- Experience with Prometheus, Loki, and Grafana.
- Experience with container orchestration platforms (Kubernetes, or Docker Swarm).
- Experience with linux and open source ecosystem
- Self-driven with an analytical, first principles
- Ability to take a complex challenge and deliver quality simple solutions
- Effective communication skills for cross-functional collaboration.
Annual Salary
$82,000.00 - $185,000.00The above annual salary range is a general guideline. Multiple factors are taken into consideration to arrive at the final hourly rate/ annual salary to be offered to the selected candidate. Factors include, but are not limited to, the scope and responsibilities of the role, the selected candidate’s work experience, education and training, the work location as well as market and business considerations.
Benefits:
As an Associate, you’ll enjoy our Total Rewards Program* to help secure your financial future and preserve your health and well-being, including:
- Premier Medical, Dental and Vision Insurance with no waiting period**
- Paid Vacation, Sick and Parental Leave
- 401(k) Plan
- Tuition Reimbursement
- Paid Training and Licensures
*Benefits may be different by location. Benefit eligibility requirements vary and may include length of service.
**Coverage begins on the date of hire. Must enroll in New Hire Benefits within 30 days of the date of hire for coverage to take effect.
The equal employment opportunity policy of the GEICO Companies provides for a fair and equal employment opportunity for all associates and job applicants regardless of race, color, religious creed, national origin, ancestry, age, gender, pregnancy, sexual orientation, gender identity, marital status, familial status, disability or genetic information, in compliance with applicable federal, state and local law. GEICO hires and promotes individuals solely on the basis of their qualifications for the job to be filled.
GEICO reasonably accommodates qualified individuals with disabilities to enable them to receive equal employment opportunity and/or perform the essential functions of the job, unless the accommodation would impose an undue hardship to the Company. This applies to all applicants and associates. GEICO also provides a work environment in which each associate is able to be productive and work to the best of their ability. We do not condone or tolerate an atmosphere of intimidation or harassment. We expect and require the cooperation of all associates in maintaining an atmosphere free from discrimination and harassment with mutual respect by and for all associates and applicants.
Other Jobs from GEICO
Senior Manager, Engineering
Senior Manager, Site Reliability Engineering - Network
Senior Manager, Site Reliability Engineering – Datacenter Hardware and IaaS
Senior Staff Solutions Engineer (REMOTE)
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 401 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say