About Appier
Appier is a software-as-a-service (SaaS) company that uses artificial intelligence (AI) to power business decision-making. Founded in 2012 with a vision of democratizing AI, Appier’s mission is turning AI into ROI by making software intelligent. Appier now has 17 offices across APAC, Europe and U.S., and is listed on the Tokyo Stock Exchange (Ticker number: 4180). Visit www.appier.com for more information.
About the role
Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Appier's services—both our internally critical and our externally-visible systems—have reliability, uptime appropriate to customer's needs and a fast rate of improvement. Additionally SRE’s will keep an ever-watchful eye on our systems capacity and performance.
Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation. On the SRE team, you’ll have the opportunity to manage the complex challenges of scale which are unique to Appier, while using your expertise in coding, algorithms, complexity analysis and large-scale system design. This includes source code management, continuous integration, artifact packaging, continuous deployment, service traffic management, service registration and discovery, as well as holistic observability and the underlying compute runtime and container orchestration. A collection of platforms and capabilities which accelerate development velocity while protecting Appier’s production availability. We are looking for all levels of seniority in the space. This is a local hire position.
Responsibilities
- Engage in and improve the whole lifecycle of services—from inception and design, through to deployment, operation and refinement.
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
- Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- Participate in on-call rotation.(remote on-call)
About you
[Minimum qualifications]
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- 2+ years of experience with software development in one or more programming languages.
- 2+ years of experience with Linux system administration.
- 1+ years of experience in designing, analyzing, and troubleshooting large-scale distributed systems, and 1+ years of experience leading projects and providing technical leadership.
- Hands-on experience in planning and deploying services on production.
[Preferred qualifications]
- Experience in architecting, developing, or maintaining production-grade cloud solutions in virtualized environments
- Experience in deployment and orchestration technologies (such as Docker, Puppet, Chef, Salt, Ansible)
- Experience in building and deploying automation and continuous integration systems
- Experience in operating a big data systems related to data access, collection, processing and storage
- Experience in operating and deploying online web services
- Experience in operating services on IaaS such as AWS and GCP.
- Experience in Database management (e.g.Database System Setup, Backup & Restore, System Tuning), MongoDB, Cassandra, MySQL, and PostgreSQL will be plus.
- Security Knowledge such as setting up Firewall, proper security policy design, network attack defense.
- Working knowledge of virtualization, hosted services, multi-tenant cloud infrastructures, storage systems and content delivery networks.
#LI-AK1

0 applies
10 views
Other Jobs from Appier
Technical Support Engineer (AI SaaS)
Staff Software Engineer, Data Backend - Tokyo, Japan
Software Engineer, System Integration
Software Engineer, Site Reliability Engineering
Software Engineer, Machine Learning Platform
Similar Jobs
Software Engineer, Site Reliability Engineering
Software Engineer, Site Reliability Engineering
Software Engineer, Site Reliability Engineering
There are more than 50,000 engineering jobs:
Subscribe to membership and unlock all jobs
Engineering Jobs
60,000+ jobs from 4,500+ well-funded companies
Updated Daily
New jobs are added every day as companies post them
Refined Search
Use filters like skill, location, etc to narrow results
Become a member
🥳🥳🥳 452 happy customers and counting...
Overall, over 80% of customers chose to renew their subscriptions after the initial sign-up.
To try it out
For active job seekers
For those who are passive looking
Cancel anytime
Frequently Asked Questions
- We prioritize job seekers as our customers, unlike bigger job sites, by charging a small fee to provide them with curated access to the best companies and up-to-date jobs. This focus allows us to deliver a more personalized and effective job search experience.
- We've got about 70,000 jobs from 5,000 vetted companies. No fake or sleazy jobs here!
- We aggregate jobs from 5,000+ companies' career pages, so you can be sure that you're getting the most up-to-date and relevant jobs.
- We're the only job board *for* software engineers, *by* software engineers… in case you needed a reminder! We add thousands of new jobs daily and offer powerful search filters just for you. 🛠️
- Every single hour! We add 2,000-3,000 new jobs daily, so you'll always have fresh opportunities. 🚀
- Typically, job searches take 3-6 months. EchoJobs helps you spend more time applying and less time hunting. 🎯
- Check daily! We're always updating with new jobs. Set up job alerts for even quicker access. 📅
What Fellow Engineers Say