Filled Positions

Thankz Hero

Site Reliability Engineer, Practices Team

Are you looking to hire?

Thankz offers a range of outstanding Site Reliability Engineer, Practices Team candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.

As a Site Reliability Engineer on the Practices Team, you'll be a key player in ensuring the reliability, scalability, and performance of our systems. Your role will involve working on infrastructure, automation, and troubleshooting tasks to optimize our platform and provide exceptional user experiences. Operating remotely, your expertise will help us uphold system health, increase productivity, and minimize system downtime. 

What you'll be doing 

  • Developing and maintaining system health indicators 
  • Automating manual operational work by coding and scripting 
  • Troubleshooting and resolving service-related issues 
  • Collaborating with software engineers to make our services more reliable, scalable, and efficient 
  • Improving the reliability and resilience of our infrastructure through root cause analysis and reviewing gaps in designs & implementations 
  • Participating in on-call rotations, driving restoration and repair of service-impacting issues 
  • Implementing practices that reduce the likelihood of recurrence of incidents 
  • Continually updating our incident playbook by documenting mitigation steps for potential issues 
  • Advocating for reliability and priority fixes to engineers throughout the organization 

Requirements 

  • Degree in Computer Science, IT, or relevant field  
  • C1/C2 English Level proficiency (both written and spoken English)  
  • Proven experience as a Site Reliability Engineer or similar software engineering role 
  • Knowledge of high-level programming languages (e.g., Python, Java) 
  • Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 
  • Familiarity with distributed computing (MapReduce, Hadoop, Hive, Spark, Gurobi, MemSQL) 
  • Strong problem-solving skills and ability to work under pressure 
  • Good knowledge of Linux 
  • Understanding of networking protocols 

Preferred candidates that have a knack for problem-solving, and a deep understanding of system design, data structures, and algorithms will be given preference. We're particularly interested in individuals who have a passion for learning new technologies.  

We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer, Practices Team. We can offer HMO and other benefits to Philippine candidates.