Job Description
Description
Get AI-powered advice on this job and more exclusive features.
Haptiq is a leader in AI-powered enterprise operations, delivering digital solutions and consulting services that drive value and transform businesses. We specialize in using advanced technology to streamline operations, improve efficiency, and unlock new revenue opportunities, particularly within the private capital markets.
Our integrated ecosystem includes PaaS – Platform as a Service, the Core Platform, an AI-native enterprise operations foundation built to optimize workflows, surface insights, and accelerate value creation across portfolios; SaaS – Software as a Service , a cloud platform delivering unmatched performance, intelligence, and execution at scale; S&C – Solutions and Consulting Suite , modular technology playbooks designed to manage, grow, and optimize company performance. With over a decade of experience supporting high-growth companies and private equity-backed platforms, Haptiq brings deep domain expertise and a proven ability to turn technology into a strategic advantage.
About the Role
We’re seeking a skilled MLOps / AIOps Engineer to lead the deployment, operation, and monitoring of AI services in production. You’ll operate at the intersection of infrastructure engineering and AI systems, ensuring our AI-powered APIs, RAG pipelines, MCPs, and agentic services run reliably, securely, and at scale. You’ll collaborate closely with ML Engineers, Python Developers, and AI Architects to design resilient infrastructure and operational workflows for distributed AI applications.
Key Responsibilities
- Design, provision, and maintain infrastructure-as-code for AI service deployment (using tools like Terraform, Pulumi, AWS CDK ).
- Build and manage CI/CD pipelines for deploying AI APIs, RAG pipelines, MCP services, and LLM agent workflows.
- Implement and maintain operational and LLM observability through monitoring and alerting systems.
- Track AI-specific operational metrics, including inference latency, error rates, drift detection, and hallucination monitoring .
- Optimize inference workloads and manage distributed AI serving frameworks ( Ray Serve, BentoML, vLLM, Hugging Face TGI , etc.).
- Collaborate with ML Engineers and Python Developers to define scalable, secure, and automated deployment processes.
- Enforce operational standards for AI system security, data governance, and compliance .
- Stay current with evolving AIOps and LLM observability frameworks , integrating emerging tools and best practices into our stack.
Required Skills & Experience
- Proficiency with cloud infrastructure (AWS, Azure, or GCP) and container orchestration platforms ( Docker, Kubernetes, ECS/EKS ).
- Hands-on experience deploying and managing AI/ML services in production .
- Strong understanding of CI/CD pipelines for AI services, LLM workflows, and model deployments .
- Experience working with distributed AI serving frameworks and inference optimization strategies .
- Solid grasp of observability practices, operational monitoring, incident response, and AI-specific performance tracking .
- Familiarity with defining and maintaining AI system health metrics, dashboards, and alerts .
- Awareness of AI security considerations, data protection policies, and operational governance requirements .
- Curiosity and openness to adopting emerging AIOps, LLM observability, and AI infrastructure tools .
- Flexible work arrangements (including hybrid mode)
- Great Paid Time Off (PTO) policy
- Opportunities for professional growth and development.
- A supportive, dynamic, and inclusive work environment.
Why Join Us?
We value creative problem solvers who learn fast, work well in an open and diverse environment, and enjoy pushing the bar for success ever higher. We do work hard, but we also choose to have fun while doing it.
The compensation range for this role is $120,000 to $140,000 CAD
Seniority level
-
Seniority level
Mid-Senior level
Employment type
-
Employment type
Full-time
Job function
-
Job function
Engineering and Information Technology
-
Industries
Software Development
Referrals increase your chances of interviewing at Haptiq by 2x
Sign in to set job alerts for “Machine Learning Specialist” roles.
Software Engineer, Backend (All Levels / All Teams)
Software Application Developer (New or Recent Graduate)
Developer Specialist (New or Recent Graduate)
Toronto, Ontario, Canada CA$70,000.00-CA$75,000.00 2 weeks ago
Software Developer Full Stack (Junior)
Frontend Software Engineer (Remote – Canada)
Software Developer, Equity Research – Toronto
Junior Software Developer (Askuity division)
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Company
Haptiq
Location
Toronto
Country
Canada
Salary
150.000
URL