Site Reliability Engineer, Practices Team

Are you looking to hire?

Thankz offers a range of outstanding Site Reliability Engineer, Practices Team candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.

As a Site Reliability Engineer on the Practices Team, you'll be a key player in ensuring the reliability, scalability, and performance of our systems. Your role will involve working on infrastructure, automation, and troubleshooting tasks to optimize our platform and provide exceptional user experiences. Operating remotely, your expertise will help us uphold system health, increase productivity, and minimize system downtime.

What you'll be doing

Developing and maintaining system health indicators
Automating manual operational work by coding and scripting
Troubleshooting and resolving service-related issues
Collaborating with software engineers to make our services more reliable, scalable, and efficient
Improving the reliability and resilience of our infrastructure through root cause analysis and reviewing gaps in designs & implementations
Participating in on-call rotations, driving restoration and repair of service-impacting issues
Implementing practices that reduce the likelihood of recurrence of incidents
Continually updating our incident playbook by documenting mitigation steps for potential issues
Advocating for reliability and priority fixes to engineers throughout the organization

Requirements

Degree in Computer Science, IT, or relevant field
C1/C2 English Level proficiency (both written and spoken English)
Proven experience as a Site Reliability Engineer or similar software engineering role
Knowledge of high-level programming languages (e.g., Python, Java)
Experience with distributed storage technologies like NFS, HDFS, Ceph, S3
Familiarity with distributed computing (MapReduce, Hadoop, Hive, Spark, Gurobi, MemSQL)
Strong problem-solving skills and ability to work under pressure
Good knowledge of Linux
Understanding of networking protocols

Preferred candidates that have a knack for problem-solving, and a deep understanding of system design, data structures, and algorithms will be given preference. We're particularly interested in individuals who have a passion for learning new technologies.

We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer, Practices Team. We can offer HMO and other benefits to Philippine candidates.

Grow With Thankz

Site Reliability Engineer, Practices Team

Subscribe to our Newsletter

Company

Legal