Site Reliability Engineering Manager - JFrog
At JFrog, we’re reinventing DevOps to help the world’s greatest companies innovate -- and we want you along for the ride. This is a special place with a unique combination of brilliance, spirit and just all-around great people. Here, if you’re willing to do more, your career can take off. And since software plays a central role in everyone’s lives, you’ll be part of an important mission. Thousands of customers, including the majority of the Fortune 100, trust JFrog to manage, accelerate, and secure their software delivery from code to production -- a concept we call “liquid software.” Wouldn't it be amazing if you could join us in our journey?
We are looking for a Site Reliability Engineering Manager to lead our Israel SRE team. In this role, you'll drive best practices in reliability engineering, ensuring the stability, availability, and performance of JFrog’s SaaS services. You'll collaborate with global SRE leaders, refine processes, and foster a culture of accountability and continuous improvement.
As a Site Reliability Engineering Manager at JFrog you will…
- Lead, mentor, and develop a high-performing SRE Israel team, fostering collaboration, innovation, and accountability
- Ensure SaaS reliability, performance, and availability, meeting or exceeding service-level objectives
- Drive SRE best practices, including capacity planning, incident management, chaos engineering, and disaster recovery
- Implement proactive monitoring, alerting, and anomaly detection aligned with SaaS standards
- Collaborate with P&E and Cloud engineering teams to embed reliability into the SDLC
- Oversee incident management, ensuring swift identification, escalation, and resolution
- Maintain comprehensive SRE documentation, including processes, incident reports, and system architecture
- Evaluate and adopt tools, technologies, and methodologies to enhance uptime and reliability
To be a Site Reliability Engineering Manager at JFrog you need…
- 3+ years of management experience leading a team of SRE, DevOps, or a similar SaaS role
- Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent experience)
- Strong expertise in cloud platforms (AWS, GCP, or Azure), containers (Kubernetes, Docker), and configuration management (Terraform, Ansible)
- Proficiency in Python or Go for automation and system optimization, as well as GitOps experience with SCM tools (e.g., Git, Bitbucket)
- Strong leadership, communication, and collaboration skills, working across globally distributed teams
- Familiarity with Agile methodologies, CI/CD pipelines, and orchestration tools (Jenkins, ArgoCD, StackStorm)
- Familiarity with Chaos Engineering (e.g., Gremlin, Litmus, Chaos Toolkit)
- Hands-on with alerting & observability tools (e.g., PagerDuty, OpsGenie, New Relic, Coralogix)
- Strong understanding of scalability, high availability, and security best practices in cloud & Kubernetes environments
Last updated: 4 hours ago
© 2021 - 2024