Senior Site Reliability Engineer

Senior Site Reliability Engineer
UneeQ, New Zealand

Experience
1 Year
Salary
0 - 0
Job Type
Job Shift
Job Category
Traveling
No
Career Level
Telecommute
No
Qualification
As mentioned in job details
Total Vacancies
1 Job
Posted on
Oct 5, 2023
Last Date
Nov 5, 2023
Location(s)

Job Description

Purpose

UneeQ is the global standard for digital humans, enabling creators and brands to bring impactful interaction into our digital world. We are seeking a Senior Site Reliability Engineer to work as part of the SRE team to ensure our platform is scalable, resilient, and reliable by:

  • Providing an infrastructure platform to the development team that handles scaling, failing, and monitoring.
  • Managing the security and availability of that infrastructure platform.
  • Recommending infrastructure solutions for business and customer requirements in a way that balances the business needs with best practice engineering practices and architecture.
  • Collaborating across multiple internal teams to design and develop infrastructure solutions supporting our business and technical strategies.
  • Facilitating change management via automated CI/CD pipelines.
  • Providing tooling to the development team to create a build and release pipeline that adheres to change management best practices.
  • Maintaining incident management processes, including on-call rosters and post-mortems.
  • Ensuring all our applications and services have measurable SLOs and develop observability tools, frameworks, and processes.
  • Monitoring the infrastructure costs against an agreed budget and work with technical and business stakeholders to align expectations and address discrepancies.
  • Working with the development team to expose metrics and open up monitoring opportunities.


This role is New Zealand based and reports to the Lead SRE. UneeQ is a remote-first workplace meaning you will mostly be working from your home.

What you’re trying to achieve
  • Increase development team efficiency by providing infrastructure and tooling that makes their lives easier.
  • Ensure that UneeQ meets or exceeds availability, performance, and security SLAs.
  • Ensure that our processes (security, change management, incident management, etc.) adhere to best practices.
  • Ensure we can report to stakeholders about our system performance and availability.
  • Use vendors to save time and effort while keeping track of the infrastructure spend.
  • Maintain a culture and habit of continuous improvement.
How we’ll measure success

The primary qualitative metric is the perception of adding value to the rest of the team, which is assessed regularly via peer feedback. Performance against quantitative SLAs includes:

  • Availability
  • Average time to respond
  • Average time to repair
  • Spend is within budget, etc.
General competencies that will help you
  • Empathy to peers and stakeholders
  • Attention to detail
  • Systems thinking
  • Process maturity building
  • Hunger for learning
  • Desire to be of service to others
  • Can-do attitude
  • Healthy skepticism

Requirements

Specific capabilities that will be necessary

  • Knowledge of current trends, tooling, and best practices within the industry for infrastructure management, application deployment, scalability, and observability
  • Knowledge of change management, incident management, and continual improvement best practices
  • Experience creating and maintaining complex configuration-as-code infrastructure projects.
  • Understanding foundational programming concepts and patterns sufficiently for routine infrastructure task automation
  • Ability to use and develop models to support capacity planning and budget management
  • Experience and good working knowledge of Terraform
  • Experience designing and implementing cloud-based solutions using AWS or Azure.
  • Solid understanding of container orchestration and experience using Kubernetes or a similar platform
  • Experience designing, implementing, and maintaining solutions in some (not all) of the following fields:
    • HTTP/WebRTC applications infrastructure.
    • Infrastructure for audio and video streaming.
    • Database (MySql, PostgreSQL, Redis, Rabbit MQ, Kafka) management.
    • Observability using Prometheus and Grafana or a similar technology.
    • CI/CD pipelines, preferably for Go/C++/JavaScript.
    • Bash or Python scripting for Linux systems.
    • User provisioning.
    • Permissions management.

Benefits

  • Competitive compensation
  • 100% of employee health insurance premiums (including vision and dental).
  • Annual learning allowance to support us to continue to develop and grow
  • Fully remote working

Job Specification

Job Rewards and Benefits

UneeQ

Information Technology and Services - Auckland, New Zealand
© Copyright 2004-2024 Mustakbil.com All Right Reserved.