Filled Positions

Thankz Hero

Site Reliability Engineer (Remote)

Are you looking to hire?

Thankz offers a range of outstanding Site Reliability Engineer (Remote) candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.

As a Site Reliability Engineer, you will play a vital role in ensuring the reliability, scalability, and performance of our systems. You will collaborate with cross-functional teams to optimize our infrastructure, automate processes, and proactively address any potential issues.  

What you'll be doing 

  • Designing, implementing, and maintaining scalable and resilient infrastructure solutions 
  • Automating deployment, configuration, and monitoring processes to streamline operations 
  • Conducting performance testing and capacity planning to identify bottlenecks and optimize system performance 
  • Troubleshooting production incidents and implementing effective resolutions 
  • Implementing monitoring and alerting systems to proactively identify and address issues
  • Collaborating with development teams to optimize application performance and reliability 
  • Participating in on-call rotations and responding to incidents in a timely manner 
  • Conducting root cause analysis to identify underlying issues and prevent future occurrences 
  • Continuously researching and evaluating new technologies and best practices to enhance system reliability 

Requirements 

  • Bachelor's degree in Computer Science, Information Systems, or a related field  
  • Proven experience as a Site Reliability Engineer or in a similar role 
  • C1/C2 English Level proficiency (both written and spoken English)  
  • Strong background in Linux/Unix administration and scripting 
  • Proficiency in at least one programming language (e.g., Python, Go, Java) 
  • Experience with configuration management tools (e.g., Ansible, Chef, Puppet) 
  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
  • Familiarity with monitoring and logging tools (e.g., Prometheus, ELK stack) 
  • Understanding of networking principles and protocols (TCP/IP, DNS, HTTP) 
  • Excellent problem-solving and troubleshooting skills 

Preferred candidates possess a deep understanding of cloud platforms, containerization technologies, and monitoring tools and with a strong background in infrastructure and automation. They have a passion for ensuring system reliability, scalability, and performance. Excellent problem-solving skills, the ability to work well in a remote team environment, and a proactive mindset are highly valued. 

We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer (Remote). We can offer HMO and other benefits to Philippine candidates.