We are looking for a highly skilled PySpark Developer with deep expertise in Distributed data processing. The ideal candidate will be responsible for optimizing Spark Jobs and ensuring efficient data processing in a Big Data platform. This role requires a strong understanding of Spark performance tuning, distributed computing, and Big data architecture.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data system.
- 8+ years of relevant experience in Apps Development or systems analysis and Ability to adjust priorities quickly as circumstances dictate
Key Responsibilities:
- Analyze and comprehend existing data ingestion and reconciliation frameworks
- Develop and implement PySpark programs to process large datasets in Hive tables and Big data platforms
- Perform complex transformations including reconciliation and advanced data manipulations
- Fine-tune Spark jobs for performance optimization, ensuring efficient data processing at scale.
- Work closely with Data Engineers, Architects, and Analysts to understand data reconciliation requirements
- Collaborate with cross-functional teams to improve data ingestion, transformation, and validation workflows
Required Skills and Qualifications:
- Extensive hands-on experience with Python, PySpark, and PyMongo for efficient data processing across distributed and columnar databases
- Expertise in Spark Optimization techniques, and ability to debug Spark performance issues and optimize resource utilization
- Proficiency in Python and Spark DataFrame API, and strong experience in complex data transformations using PySpark
- Experience working with large-scale distributed data processing, and solid understanding of Big Data architecture and distributed computing frameworks
- Strong problem-solving and analytical skills.
- Experience with CI/CD for data pipelines
- Experience with SnowFlake for data processing and integration
Education:
- Bachelor’s degree/University degree or equivalent experience in Computer science
- Master’s degree preferred
We are looking for a highly skilled PySpark Developer with deep expertise in Distributed data processing. The ideal candidate will be responsible for optimizing Spark Jobs and ensuring efficient data processing in a Big Data platform. This role requires a strong understanding of Spark performance tuning, distributed computing, and Big data architecture.
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data system.
- 8+ years of relevant experience in Apps Development or systems analysis and Ability to adjust priorities quickly as circumstances dictate
Key Responsibilities:
- Analyze and comprehend existing data ingestion and reconciliation frameworks
- Develop and implement PySpark programs to process large datasets in Hive tables and Big data platforms
- Perform complex transformations including reconciliation and advanced data manipulations
- Fine-tune Spark jobs for performance optimization, ensuring efficient data processing at scale.
- Work closely with Data Engineers, Architects, and Analysts to understand data reconciliation requirements
- Collaborate with cross-functional teams to improve data ingestion, transformation, and validation workflows
Required Skills and Qualifications:
- Extensive hands-on experience with Python, PySpark, and PyMongo for efficient data processing across distributed and columnar databases
- Expertise in Spark Optimization techniques, and ability to debug Spark performance issues and optimize resource utilization
- Proficiency in Python and Spark DataFrame API, and strong experience in complex data transformations using PySpark
- Experience working with large-scale distributed data processing, and solid understanding of Big Data architecture and distributed computing frameworks
- Strong problem-solving and analytical skills.
- Experience with CI/CD for data pipelines
- Experience with SnowFlake for data processing and integration
Education:
- Bachelor’s degree/University degree or equivalent experience in Computer science
- Master’s degree preferred
------------------------------------------------------
Job Family Group:
Technology------------------------------------------------------
Job Family:
Applications Development------------------------------------------------------
Time Type:
Full time------------------------------------------------------
Citi is an equal opportunity and affirmative action employer.
Qualified applicants will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran.
Citigroup Inc. and its subsidiaries ("Citi”) invite all qualified interested applicants to apply for career opportunities. If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View the "EEO is the Law" poster. View the EEO is the Law Supplement.
View the EEO Policy Statement.
View the Pay Transparency Posting
Other Jobs from Citi
Communications Lead (Marketing) - VP
Senior Business Data Analyst
Senior Architect & Technical Delivery Lead - Inventory Orchestration (IO) Platform
Legal Project Manager
Application Developer Java/Spark - C11
Similar Jobs
Data Engineer
Staff Data Engineer
Machine Learning Engineer Intern - Master’s
Lead DevSecOps Engineer
Lead DevSecOps Engineer
Lead Site Reliability Engineer- Security
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say