Site Reliability Engineer (Remote)

Are you looking to hire?

Thankz offers a range of outstanding Site Reliability Engineer (Remote) candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.

As a Site Reliability Engineer, you will play a vital role in ensuring the reliability, scalability, and performance of our systems. You will collaborate with cross-functional teams to optimize our infrastructure, automate processes, and proactively address any potential issues.

What you'll be doing

Designing, implementing, and maintaining scalable and resilient infrastructure solutions
Automating deployment, configuration, and monitoring processes to streamline operations
Conducting performance testing and capacity planning to identify bottlenecks and optimize system performance
Troubleshooting production incidents and implementing effective resolutions
Implementing monitoring and alerting systems to proactively identify and address issues
Collaborating with development teams to optimize application performance and reliability
Participating in on-call rotations and responding to incidents in a timely manner
Conducting root cause analysis to identify underlying issues and prevent future occurrences
Continuously researching and evaluating new technologies and best practices to enhance system reliability

Requirements

Bachelor's degree in Computer Science, Information Systems, or a related field
Proven experience as a Site Reliability Engineer or in a similar role
C1/C2 English Level proficiency (both written and spoken English)
Strong background in Linux/Unix administration and scripting
Proficiency in at least one programming language (e.g., Python, Go, Java)
Experience with configuration management tools (e.g., Ansible, Chef, Puppet)
Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
Familiarity with monitoring and logging tools (e.g., Prometheus, ELK stack)
Understanding of networking principles and protocols (TCP/IP, DNS, HTTP)
Excellent problem-solving and troubleshooting skills

Preferred candidates possess a deep understanding of cloud platforms, containerization technologies, and monitoring tools and with a strong background in infrastructure and automation. They have a passion for ensuring system reliability, scalability, and performance. Excellent problem-solving skills, the ability to work well in a remote team environment, and a proactive mindset are highly valued.

We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer (Remote). We can offer HMO and other benefits to Philippine candidates.

Grow With Thankz

Site Reliability Engineer (Remote)

Subscribe to our Newsletter

Company

Legal