Site Reliability Engineer

September 5, 2025

Apply for this job

Job Description

Description

Job Description

Role & Responsibilities:

We are looking for an SRE to join an enabling team responsible for managing all infrastructure supporting Canada’s largest QSR. This role is ideal for a hands-on engineer with broad cloud experience who thrives in agile environments and enjoys working across the full lifecycle of infrastructure automation and security.

Responsibilities include:

  • Design, implement, and manage infrastructure automation solutions using Infrastructure as Code (Terraform) to ensure consistency and reliability across environments.
  • Support and enhance CI/CD pipelines (GitHub Actions, CircleCI) for efficient, reliable software deployments.
  • Monitor application and infrastructure performance using observability tools such as Datadog, addressing issues proactively and optimizing system performance.
  • Collaborate with development and operations teams to improve deployment efficiency and reliability.
  • Implement and maintain security best practices, ensuring compliance with standards and protocols.
  • Participate in incident response, root cause analysis, and develop preventative measures.
  • Support operational excellence initiatives to improve system health, reduce downtime, and enhance user experience.
  • Create and maintain technical documentation, operational runbooks, and infrastructure standards.
  • Participate in architectural and production readiness reviews.
  • Assist leadership with cost engineering and capacity planning.
  • Debug production issues across the full stack and promote automation for operational challenges.
  • Participate in scheduled on-call rotations to ensure system reliability.

We are committed to creating diverse and inclusive environments where everyone can bring their full, authentic selves to work. We are an equal opportunity employer that considers all qualified candidates regardless of race, ethnicity, religion, sex, sexual orientation, gender identity, marital status, national origin, age, disability, veteran status, or other protected characteristics. For accommodations during the application process, contact For privacy information, review our Workforce Privacy Policy:

Skills and Requirements:

  • Minimum 4+ years of hands-on experience in cloud infrastructure engineering.
  • Strong experience with serverless architecture and monitoring in AWS, Azure, or GCP.
  • Experience managing infrastructure beyond EC2, focusing on Lambda, DynamoDB, EKS.
  • Experience setting up CI/CD pipelines independently.
  • Proficiency with Terraform for infrastructure provisioning.
  • Understanding of infrastructure and network security.
  • Strong communication skills for cross-functional collaboration.
  • Experience working in agile environments.
  • Managing core infrastructure components such as IAM and networking.
  • Programming skills in Python (preferred), TypeScript, or Go.
  • Familiarity with AI-assisted coding tools like Cursor.
  • AWS Certified DevOps Engineer or Solutions Architect certification.

#J-18808-Ljbffr

Company

Insight Global

Location

Toronto

Country

Canada

Salary

125.000

URL

https://en-ca.whatjobs.com/coopob__cpl___291_2640113__3337?utm_source=3337&utm_medium=feed&keyword=Site-Reliability-Engineer&location=Toronto&geoID=6225