Responsibilities:
- Install, configure, and maintain HPC clusters, including hardware and software components.
- Monitor system performance, identify bottlenecks, and implement solutions to optimize performance.
- Manage user accounts, permissions, and resource allocation.
- Perform regular system maintenance, updates, and patching.
- Troubleshoot and resolve hardware and software issues in a timely manner.
- Participate in the design and planning of HPC infrastructure upgrades and expansions.
- Evaluate and recommend hardware and software solutions to meet evolving computational needs.
- Implement and manage storage systems, networking infrastructure, and interconnects (e.g., InfiniBand).
- Optimize system configurations and application performance for HPC workloads.
- Profile and analyze application performance to identify areas for improvement.
- Implement and utilize performance monitoring tools and techniques.
- Provide technical support and training to HPC users.
- Collaborate with researchers and scientists to understand their computational requirements.
- Work closely with HPC architects and engineers to ensure that research needs are met.
- Document system configurations, procedures, and best practices.
- Assist HPC engineers and architects with day-to-day operations and ticket management.
- Implement and maintain security measures to protect HPC infrastructure and data.
- Ensure compliance with relevant security policies and regulations.
- Manage data backups and disaster recovery procedures.
Qualifications:
- Bachelor's degree in computer science, engineering, or a related field. Experience may substitute for the degree.
- Minimum of 10 yrs experience working with systems; 5yrs specifically with HPC.
- Strong knowledge of Linux operating systems (e.g., Rocky, Ubuntu).
- Experience with cluster management tools (e.g., Slurm, PBS).
- Familiarity with high-speed interconnects (e.g., InfiniBand, Ethernet).
- Knowledge of parallel file systems (e.g., Lustre, SEPH, GPFS).
- Proficiency in scripting languages (e.g., R, Python, Bash).
- Understanding of HPC hardware architectures and technologies (e.g., CPUs, GPUs, memory).
- Strong demonstrated experience with a major configuration management software (e.g. Terraform, Ansible), including application packaging and installation.
- Must have strong knowledge of Linux security and Linux shell scripting.
- Strong communication and interpersonal skills.
- Knowledge of data transfer protocols and large-scale storage solutions.
Other Jobs from Rackspace
Lead Platform Enterprise Architect
Data Engineering Manager
Site Reliability Engineer III-IN (Ref: G)
Senior Engagement/Project Manager - IN (Professional Services)
Similar Jobs
Network Automation Engineer (Operations)
Senior Staff Engineer - (Linux,AWS,Terraform,Big Data)
Cloud Engineer - AWS
DevOps Engineer II
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say