Location: St. Louis, MO
- We are looking for an experienced Site Reliability/DevOps Engineer with strong scripting and automation ability to join our growing team.
- Here SRE/DevOps own end-to-end availability and reliability of our cloud services.
- They proactively work to identify factors impacting the availability of critical endpoints and provide solutions.
- They explore our cloud technologies and extend the limits of existing infrastructure to solve unique scaling challenges.
- Design, write, and deliver software to improve the reliability, scalability, capacity, and latency of Developer Network services
- Identify recurring problems and build the tools and processes to prevent problems from recurring
- Build the tools and processes to help quickly triage issues and identify the component(s) that need to be fixed
- Identify and build monitoring and alarming solutions
- Work with distributed teams to ensure that components are properly instrumented to be reliably used, monitored, and debugged in the service
- Conduct periodic 24×7 on-call duties
- Bachelor’s degree in Computer Science, MIS, related field or equivalent experience.
- 3+ years of related experience.
- Systems fluency (Windows, Linux, storage, networking).
- Knowledge on DevOps and Agile SAFe
- Observability systems (Prometheus/Graffana, ELK, Dynatrace)
- Modern software components (Mongo, MSSQL, ElasticSearch, RabbitMQ, Kafka)
- Infrastructure and configuration automation (Powershell, Ansible, VMWare, RDM, VSCode)
- Experience working on of the following: AWS/Azure/Google Cloud
- Experience with monitoring solutions
- Experience with Kubernetes, Mesos, Docker, microservices
- Experience with MySql, PostGres, MongoDB, Cassandra, and Cloud DB solutions plus Database encryption