As the Company at the forefront of the creation of the Mobile world, and with more than 60,000 patents to our name, we’ve made it our business to make a mark. Being part of Ericsson empowers you to learn, lead and perform at your best, shaping future technology. Ericsson is an inclusive employer where you are recognized for the skills, talent, and perspective you bring to the team.
Within the Solution Area Cognitive Networks Solutions Software R&D (SA CNS SW R&D), we offer the opportunity to collaborate with highly qualified Global teams and to enable success stories for our customers. You will be exposed to groundbreaking technology (5G, ML/AI, Automation and Cloud computing) and support the delivery of multiple Data-Intensive projects from our customer base and internal requirements. You must have the ability to perform hands-on Cloud and Data engineering tasks independently coupled with the appropriate testing. Are you ready to write the future with us?
Come, and be where it begins.
We are seeking a seasoned Senior/Tech Lead Site Reliability Engineer (SRE) to oversee the design, deployment, and maintenance of its cloud-native SaaS infrastructure on AWS working with Ericsson R&D Global Teams. This position demands in-depth expertise in AWS technologies—including Fargate and App Runner—Terraform, Python (AWS SDK), Helm, and GitOps. The individual will lead a dedicated SRE team to establish best practices, optimize service reliability, and continuously improve security and performance across a multi-tenant SaaS environment.
Key Responsibilities
Cloud-Native SaaS Architecture
- Architect, deploy, and manage multi-tenant SaaS Cognitive Solutions on AWS using AWS Services (e.g., IAM, S3, EKS, ECS, Fargate, App Runner, RedShift, SNS, SQS, EventBridge, Athena, SageMaker, Aurora, DynamoDB, Cognito, API Gateway, etc.) to build Microservices, Data Flows, Data Warehouse, and AI/ML models, emphasizing scalability, reliability, and cost efficiency.
- Champion microservices, container orchestration, and serverless paradigms to ensure high availability and optimal performance.
Infrastructure as Code (IaC)
- Develop and maintain infrastructure definitions using Terraform to enable reliable, automated, and repeatable deployments.
- Collaborate with cross-functional teams to incorporate IaC principles into CI/CD pipelines, accelerating feature releases and minimizing downtime.
Site Reliability Engineering & Observability
- Define and track Service Level Indicators (SLIs) and Objectives (SLOs), establishing error budgets that align with organizational goals.
- Implement robust observability solutions (e.g., AWS CloudWatch, CloudTrail, AWS Config, etc.) to proactively detect and resolve performance bottlenecks.
Containerization & Helm
- Utilize Kubernetes (EKS) and Helm charts to package, configure, and deploy containerized applications efficiently.
- Streamline container orchestration workflows, focusing on auto-scaling, upgrades, rollbacks, and enhanced service resiliency.
GitOps & Automation
- Employ GitOps tools (Argo CD, Flux) to govern infrastructure and application deployments through declarative, version-controlled configurations.
- Automate operational tasks using scripting languages (Python, Bash, PowerShell) and AWS SDK (boto3), improving developer productivity and reducing manual overhead.
DevSecOps & Compliance
- Embed security best practices within the software development lifecycle, covering identity and access management (IAM), networking, VPC, encryption, and monitoring.
- Ensure adherence to cloud compliance standards (SOC 2, HIPAA, GDPR, etc.), performing regular audits and vulnerability scans to maintain a robust security posture.
AI & Machine Learning Operations (MLOps)
- Provide operational support for AI/ML models running on AWS, collaborating with data science teams to optimize performance and reliability.
- Integrate MLOps methodologies into existing workflows, ensuring seamless model deployment, monitoring, and updates.
Performance & Cost Optimization
- Conduct capacity planning, load testing, and performance tuning across AWS resources.
- Leverage reserved instances, auto-scaling, and right-sizing strategies to balance reliability, performance, and cost effectiveness.
Incident Management & Continuous Improvement
- Oversee on-call rotations and lead incident response, rapidly mitigating service disruptions and guiding root cause analysis.
- Foster a culture of continuous improvement, refining operational processes and enhancing platform architecture to boost resilience.
Leadership & Mentorship
- Manage and mentor a cross-functional SRE team, promoting a collaborative, results-driven environment and advancing professional growth.
- Collaborate with product owners, development teams, and stakeholders to align SRE priorities with broader business objectives.
Required Qualifications
Education
Bachelor’s degree in Computer Science, Computer Engineering, or a related field.
Experience
Overall Software Development: 6+ years of professional experience in software development.
Site Reliability Engineering: 3+ years of dedicated SRE experience with a primary focus on AWS cloud services and infrastructure.
Technical Expertise
Cloud Computing Concepts: Deep understanding of virtualization, networking, and storage in public cloud environments.
AWS Proficiency: Demonstrated ability to manage, operate, and secure AWS services (., IAM, S3, EKS, ECS, Fargate, App Runner, RedShift, SNS, SQS, EventBridge, Athena, SageMaker, Aurora, DynamoDB, Cognito, API Gateway, etc.).
AWS for AI/ML: Hands-on support of AI/ML model operations on AWS, collaborating with data science teams and optimizing ML workloads.
Kubernetes & Container Management: Proven experience with Kubernetes (preferably EKS) for container orchestration, including deploying and maintaining production workloads.
Helm Package Management: Skilled in creating and managing Helm charts for Kubernetes-based applications.
IaC Frameworks: Proficiency in Terraform and Burrito (if applicable), ensuring production-grade, scalable infrastructure definitions.
Scripting & Automation: Advanced skills in Python (including AWS SDK/boto3), Bash, and/or PowerShell for automating cloud operations.
DevSecOps & GitOps: Hands-on experience integrating security best practices into CI/CD pipelines, leveraging GitOps tools (Argo CD, Flux) for declarative deployments.
MLOps: Working knowledge of machine learning lifecycle management, ensuring robust and efficient AI/ML model deployments.
Linux Administration: Strong background in Linux system management, performance tuning, and troubleshooting.
Networking: Expertise in VPNs, firewalls, routing, switching, DNS, load balancers, and related security considerations.
Monitoring & Observability: Proficiency with one or more monitoring solutions (Datadog, Prometheus, Grafana, CloudWatch) to drive proactive incident response.
Security & Compliance: In-depth familiarity with SOC 2, HIPAA, GDPR, and best practices around IAM, encryption, and network segmentation.
Problem-Solving & Communication: Demonstrated strength in diagnosing complex technical issues and effectively communicating solutions to varied stakeholders.
Certifications
AWS Certifications: AWS Certified Solutions Architect (Associate/Professional), AWS Certified DevOps Engineer – Professional, or other relevant certifications.
Additional certifications in GCP, Azure, security (CISSP, CISM) are considered advantageous.
Additional Desirable Qualifications
Other Cloud Environments: Exposure to Azure or further GCP services beyond AI/ML is beneficial.
Advanced Programming/Scripting: Experience in Python, Go or other modern languages is a plus.
Team Leadership: Demonstrated success in building and leading cross-functional teams, including performance management and strategic planning.
Non-technical skills:
Be inspired by the needs of fast-changing environments.
Happy to work within distributed teams.
Coordinate with software, DevSecOps, and domain experts.
Proactive & team player.
Excellent oral and written communication skills.
Why join Ericsson?
At Ericsson, you´ll have an outstanding opportunity. The chance to use your skills and imagination to push the boundaries of what´s possible. To build solutions never seen before to some of the world’s toughest problems. You´ll be challenged, but you won’t be alone. You´ll be joining a team of diverse innovators, all driven to go beyond the status quo to craft what comes next.
What happens once you apply?
Click Here to find all you need to know about what our typical hiring process looks like.
Encouraging a diverse and inclusive organization is core to our values at Ericsson, that's why we champion it in everything we do. We truly believe that by collaborating with people with different experiences we drive innovation, which is essential for our future growth. We encourage people from all backgrounds to apply and realize their full potential as part of our Ericsson team. Ericsson is proud to be an Equal Opportunity Employer. learn more.
Primary country and city: Egypt (EG) || [[location_obj]]
Req ID: 762790
Other Jobs from Ericsson
Data Analyst
Senior network Engineer IP-INFRA
Network Program Director
Full Stack Developer
Master Data Intern
Senior Engineer
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say