Machine Learning Ops / AI Ops Engineer Data Science / AI Toronto, Canada

September 5, 2025

Apply for this job

Job Description

Description

Haptiq is a leader in AI-powered enterprise operations, delivering digital solutions and consulting services that drive value and transform businesses. We specialize in using advanced technology to streamline operations, improve efficiency, and unlock new revenue opportunities, particularly within the private capital markets.

Our integrated ecosystem includesPaaS – Platform as a Service, the Core Platform, an AI-native enterprise operations foundation built to optimize workflows, surface insights, and accelerate value creation across portfolios;SaaS – Software as a Service , a cloud platform delivering unmatched performance, intelligence, and execution at scale;S&C – Solutions and Consulting Suite , modular technology playbooks designed to manage, grow, and optimize company performance. With over a decade of experience supporting high-growth companies and private equity-backed platforms, Haptiq brings deep domain expertise and a proven ability to turn technology into a strategic advantage.

About the Role

We’re seeking askilled MLOps / AIOps Engineer to lead the deployment, operation, and monitoring of AI services in production. You’ll operate at the intersection of infrastructure engineering and AI systems, ensuring ourAI-powered APIs, RAG pipelines, MCPs, and agentic services run reliably, securely, and at scale. You’ll collaborate closely withML Engineers, Python Developers, and AI Architects to design resilient infrastructure and operational workflows for distributed AI applications.

Key Responsibilities

  • Design, provision, and maintaininfrastructure-as-code for AI service deployment (using tools likeTerraform, Pulumi, AWS CDK ).
  • Build and manageCI/CD pipelines for deploying AI APIs, RAG pipelines, MCP services, and LLM agent workflows.
  • Implement and maintainoperational and LLM observability through monitoring and alerting systems.
  • Track AI-specific operational metrics, includinginference latency, error rates, drift detection, and hallucination monitoring .
  • Optimize inference workloads and manage distributed AI serving frameworks (Ray Serve, BentoML, vLLM, Hugging Face TGI , etc.).
  • Collaborate withML Engineers and Python Developers to define scalable, secure, and automated deployment processes.
  • Enforce operational standards forAI system security, data governance, and compliance .
  • Stay current with evolvingAIOps and LLM observability frameworks , integrating emerging tools and best practices into our stack.

Required Skills & Experience

  • Proficiency withcloud infrastructure (AWS, Azure, or GCP) and container orchestration platforms (Docker, Kubernetes, ECS/EKS ).
  • Hands-on experience deploying and managingAI/ML services in production .
  • Strong understanding ofCI/CD pipelines for AI services, LLM workflows, and model deployments .
  • Experience working withdistributed AI serving frameworks and inference optimization strategies .
  • Solid grasp ofobservability practices, operational monitoring, incident response, and AI-specific performance tracking .
  • Familiarity with defining and maintainingAI system health metrics, dashboards, and alerts .
  • Awareness ofAI security considerations, data protection policies, and operational governance requirements .
  • Curiosity and openness to adopting emergingAIOps, LLM observability, and AI infrastructure tools .

Why Join Us?

We value creative problem solvers who learn fast, work well in an open and diverse environment, and enjoy pushing the bar for success ever higher. We do work hard, but we also choose to have fun while doing it.

Job ID | Posted on June 27, 2025

Can’t find the right role? Email your resume to be considered for new positions in the future.

Haptiq does not discriminate on the basis of race, sex, color, religion, age, national origin, marital status, disability, veteran status, genetic information, sexual orientation, gender identity or any other reason prohibited by law in provision of employment opportunities and benefits.

#J-18808-Ljbffr

Company

Heart Talent

Location

Toronto

Country

Canada

Salary

150.000

URL

https://en-ca.whatjobs.com/coopob__cpl___291_2629680__3337?utm_source=3337&utm_medium=feed&keyword=Machine-Learning-Ops&location=Toronto&geoID=6225