【AI Team】DevOps/SRE

Website iKala

【Responsibilities】
  1. Improve service monitoring system which is able to detect system problems to prevent AI project failure.
  2. Discover and evaluate new tools, technologies for the team for better development and operation.
  3. Communicate with the team through design docs, tech talks, and code reviews.
  4. Participate in solution design and advise other developers to build scalable, maintainable, and efficient systems.
  5. Build working monitoring and logging infrastructure catered to distributed systems
  6. Design, implement and maintain infrastructures for application CICD, machine learning, and deep learning algorithm deployment pipelines.
  7. Have fun as part of an awesome team.
【Requirements】
  1. 1+ years with UNIX/Linux systems administration.
  2. 1+ years of production experience with Docker and Kubernetes.
  3. Experienced with public cloud (GCP, AWS, Azure), GCP is a big plus.
  4. Experienced with bash script or python.
  5. Experienced with at least one monitoring tool (Thanos/Prometheus/Grafana is a big plus).
  6. Experienced with at least one log gathering tool (ELK/EFK/Loki+Promtail+Grafana)
  7. Experienced with CICD pipelines.
  8. Experienced with Git.
【Pluses】
  1. Experience with machine learning or data science background.
  2. Experience in ai relational production or project developing.
  3. Experience with CNCF, including Helm, Istio, Argo, Thanos, or others.
  4. Experience in relational database administration.

 

Apply Now Back to Job list

To apply for this job email your details to amy.chen@ikala.tv