Filled Positions
Site Reliability Engineer (Remote)
Are you looking to hire?
Thankz offers a range of outstanding Site Reliability Engineer (Remote) candidates. If you're searching for top talent in this field or a similar position, our team can find the ideal person who meets your specific needs and requirements.
As a Site Reliability Engineer, you will play a vital role in ensuring the reliability, scalability, and performance of our systems. You will collaborate with cross-functional teams to optimize our infrastructure, automate processes, and proactively address any potential issues.
What you'll be doing
- Designing, implementing, and maintaining scalable and resilient infrastructure solutions
- Automating deployment, configuration, and monitoring processes to streamline operations
- Conducting performance testing and capacity planning to identify bottlenecks and optimize system performance
- Troubleshooting production incidents and implementing effective resolutions
- Implementing monitoring and alerting systems to proactively identify and address issues
- Collaborating with development teams to optimize application performance and reliability
- Participating in on-call rotations and responding to incidents in a timely manner
- Conducting root cause analysis to identify underlying issues and prevent future occurrences
- Continuously researching and evaluating new technologies and best practices to enhance system reliability
Requirements
- Bachelor's degree in Computer Science, Information Systems, or a related field
- Proven experience as a Site Reliability Engineer or in a similar role
- C1/C2 English Level proficiency (both written and spoken English)
- Strong background in Linux/Unix administration and scripting
- Proficiency in at least one programming language (e.g., Python, Go, Java)
- Experience with configuration management tools (e.g., Ansible, Chef, Puppet)
- Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes)
- Familiarity with monitoring and logging tools (e.g., Prometheus, ELK stack)
- Understanding of networking principles and protocols (TCP/IP, DNS, HTTP)
- Excellent problem-solving and troubleshooting skills
Preferred candidates possess a deep understanding of cloud platforms, containerization technologies, and monitoring tools and with a strong background in infrastructure and automation. They have a passion for ensuring system reliability, scalability, and performance. Excellent problem-solving skills, the ability to work well in a remote team environment, and a proactive mindset are highly valued.
We offer a full-time, US-hours remote job, 40-hour workweek Mon-Fri, with excellent prospects for long-term growth for an ambitious experienced Site Reliability Engineer (Remote). We can offer HMO and other benefits to Philippine candidates.