ZenML

docs.zenml.io
AI & Machine Learning

A MLOps framework for machine learning pipelines that run anywhere - AWS Sagemaker, GCP Vertex AI, Kubeflow Pipelines with MLflow and more!

llms.txt

ZenML - Bridging the gap between ML & Ops

ZenML

Kitaru

  • Welcome to Kitaru: The runtime layer underneath your agent stack.
  • Installation: Install Kitaru with uv or pip
  • Quickstart: Run your first durable agent flow with Kitaru
  • Deploy: Move from local development to running agents in production
  • Examples: Runnable Kitaru examples — start with the Agent Harness Platform tour, or jump to a feature-focused example
  • Troubleshooting: Diagnose problems and reset Kitaru state
  • Overview: A runnable reference architecture for building an internal agent harness platform with Kitaru and PydanticAI
  • Durable Agent: PydanticAI runs the agent loop. Kitaru keeps a durable record of the work that finished before a crash.
  • Sandbox: Run the agent's shell commands inside a Docker sandbox, so a mistaken command hits a throwaway container instead of your host.
  • Skills: Move the agent's procedure out of the system prompt and into a markdown file an operator can edit without changing code.
  • Credential Proxy: A separate proxy container holds the service credentials and injects auth headers; the worker never holds them
  • Typed Services: Add exec_service for structured host-side calls (look up a record, create a ticket, publish a summary) when a shell command is the wrong shape
  • Human in the Loop: ask_question, a freeform HITL tool that pauses the flow until an operator answers from any surface
  • Production Notes: Which pieces of the Agent Harness Platform tour are teaching stand-ins, where each one plugs into production, and what to harden before you rely on the pattern
  • Overview: The mental model behind Kitaru's durable execution primitives.
  • Harness, Runtime, Platform: Where Kitaru fits — and doesn't — in an agent stack.
  • How It Works: What runs where when you execute a Kitaru flow — server, runner, execution targets, and the contract between them.
  • Flows: Define durable execution boundaries for your AI agent workflows.
  • Deployments: Version and share durable flow entrypoints for remote invocation.
  • Checkpoints: Durable work units with persistence and concurrency support.
  • Wait, Input, and Resume: Pause flows for human or agent input, then resume from where they left off.
  • Logging and Metadata: Attach structured data to executions and checkpoints.
  • Configuration: Kitaru config directory, execution defaults, environment variables, and precedence
  • Authentication: Service accounts, API keys, and short-lived bearer tokens for Kitaru servers
  • Deploy and Invoke Flows: A practical producer-consumer guide to deploying Kitaru flows, moving tags, and invoking stable or canary routes
  • Containerization: How Kitaru builds and configures container images for remote execution
  • Execution Management: Inspect execution status, fetch runtime logs, resolve waits, and manage lifecycle actions
  • View Execution Runtime Logs: Retrieve execution and checkpoint runtime logs from the SDK, CLI, and MCP
  • Checkpoint Live Events: Publish and watch best-effort live progress and custom events from running checkpoints
  • Replay and Overrides: Replay executions from checkpoints with flow and checkpoint overrides
  • Wait, Input, and Resume: Suspend a flow for external input and continue the same execution
  • Artifacts: Persist named values in checkpoints and reuse them across executions
  • Error Handling: Understand Kitaru exception types and failure journaling
  • Tracked LLM Calls: Use kitaru.llm() with model aliases, transported runtime config, and optional secret-backed credentials
  • Secrets and Model Registration: Store provider credentials, register a model alias, and use kitaru.llm() inside a flow
  • Secrets: Create, inspect, list, and delete centralized secrets from the Kitaru CLI and Python SDK
  • Choose an Adapter: Pick the Kitaru integration path for your existing agent harness
  • Overview: Use Kitaru with PydanticAI, OpenAI Agents, Claude Agent SDK, Gemini Interactions, and LangGraph.
  • Pydantic AI: Make any PydanticAI agent replayable, resumable, and observable by wrapping it once with KitaruAgent
  • OpenAI Agents: Wrap an OpenAI Agents SDK Agent with KitaruRunner so calls are durable and replayable inside Kitaru flows
  • Claude Agent SDK: Wrap Claude Agent SDK invocations in Kitaru checkpoints, capture session context, and replay completed Claude calls honestly
  • Gemini Interactions: Make Gemini Interactions API turns replayable and observable with Kitaru checkpoints, including Antigravity managed-agent runs
  • LangGraph: Run LangGraph graphs inside Kitaru flows with either coarse graph-call checkpoints or granular LangChain call checkpoints
  • Docker: Deploy the Kitaru server using Docker or Docker Compose
  • Helm: Deploy the Kitaru server on Kubernetes using the Kitaru Helm chart
  • Overview: Create, inspect, switch, and delete the stacks Kitaru uses for execution
  • Kubernetes Stacks: Create, inspect, use, and clean up Kubernetes-backed stacks in Kitaru
  • Vertex Stacks: Create, inspect, and use Vertex AI-backed stacks with GCS storage
  • SageMaker Stacks: Create, inspect, and use SageMaker-backed stacks with S3 storage
  • AzureML Stacks: Create, inspect, and use AzureML-backed stacks with Azure Blob storage
  • Log Store: Set, inspect, and reset Kitaru's global runtime log-store backend
  • MCP Server: Query and manage Kitaru executions, deployments, artifacts, stacks, and secret creation through Model Context Protocol tools
  • Claude Code Skill: Install the zenml-io/kitaru-skills package for Kitaru quickstarts, workflow authoring, and adapter migrations
  • Contributing: How to contribute to Kitaru.

Learn

ZenML Pro

  • Introduction: Learn about the ZenML Pro features and deployment scenarios.
  • System Architecture: Understanding ZenML Pro services and how they communicate.
  • Scenarios: Compare ZenML Pro deployment scenarios to find the right fit for your organization.
  • SaaS: Learn about ZenML Pro SaaS deployment - the fastest way to get started with production-ready MLOps.
  • Hybrid: Learn about ZenML Pro Hybrid SaaS deployment - balancing control with convenience for enterprise MLOps.
  • Self-hosted: Learn about ZenML Pro Self-hosted deployment - complete control and data sovereignty for the strictest security requirements.
  • Deployment Details: Reference documentation for deploying ZenML Pro components.
  • Prerequisites: Prepare for deploying the ZenML Pro control plane and/or workspace servers in a self-hosted environment.
  • Control Plane: Configuration reference for the ZenML Control Plane.
  • Kubernetes with Helm: Deploy ZenML Pro Self-hosted on Kubernetes with Helm - complete self-hosted setup with no external dependencies.
  • Workspace Server: Configuration reference for the ZenML Workspace Server.
  • Enroll Workspaces: Enroll a ZenML Pro workspace in the ZenML Pro control plane
  • Kubernetes with Helm: Deploy ZenML Pro workspaces on Kubernetes with Helm and enroll them in the ZenML Pro control plane
  • AWS ECS: Deploy ZenML Pro Hybrid on AWS ECS with a managed control plane.
  • Enable Snapshot Support: Enable snapshot support for self-hosted ZenML Pro workspaces
  • Enable Event Triggers and Schedules: Enable ZenML Pro event triggers and schedules (scheduler and executor microservices) for self-hosted workspace servers on Kubernetes.
  • Enable Resource Pools: Enable the ZenML Pro resource pool reconciler microservice for self-hosted workspace servers on Kubernetes.
  • Single Sign-On (SSO): Configure Single Sign-On (SSO) authentication for ZenML Pro self-hosted deployments.
  • User Accounts: Understand and manage user accounts in ZenML Pro self-hosted deployments.
  • Upgrades and Updates: How to upgrade ZenML Pro components.
  • Control Plane: How to upgrade the ZenML Control Plane.
  • Workspace Server: How to upgrade ZenML Workspace Servers.
  • Hierarchy: Understanding ZenML's hierarchical structure
  • Organizations: Manage organizations in ZenML
  • Workspaces: Learn how to use workspaces in ZenML Pro.
  • Projects: Managing projects in ZenML
  • Teams: Learn about Teams in ZenML Pro and how they can be used to manage groups of users across your organization and workspaces.
  • Snapshots: Trigger pipelines from the dashboard, SDK, CLI, or REST API.
  • Triggers: Trigger pipelines by schedule or event.
  • Resource Pools: Fair GPU and compute sharing for AI/ML teams: dependable production capacity, shared pools, idle reuse, and workspace-level quotas.
  • Core Concepts: Precise definitions for ZenML Pro resource pools, subject policies, and resource requests.
  • Reconciliation Process: How the resource pool reconciliation process works in ZenML Pro.
  • Examples: Step-by-step ZenML Pro resource pool examples: pool JSON, policy JSON, ResourceSettings, and outcomes for new users.
  • Roles & Permissions: Learn about the different roles and permissions you can assign to your team members in ZenML Pro.
  • Trusted domains: Organization trusted domains in ZenML Pro — user visibility, invitations, SSO, and how operators configure them.
  • Personal Access Tokens: Learn how to manage and use Personal Access Tokens.
  • Service Accounts: Learn how to manage and use service accounts and API keys .
  • Secrets Stores: Learn how to link your own secrets store backend to your ZenML Pro workspace.

Stacks

API Reference

SDK Reference

Changelog

  • Overview: Stay up to date with the latest features, improvements, and fixes across ZenML OSS and ZenML Pro.
  • Server & SDK: Changelog for ZenML OSS and ZenML UI.
  • Pro Control Plane: Changelog for ZenML Pro.

Agent Instructions: Querying This Documentation

If you need additional information, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on a page URL with the ask query parameter:

GET https://docs.zenml.io/getting-started/introduction.md?ask=<question>

The question should be specific, self-contained, and written in natural language. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.

Related

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

/llms.txt
15,391 tokens
AI & Machine Learning

Perplexity AI is an AI-powered search engine that provides direct answers to user queries by leveraging large language models.

/llms.txt
1,173 tokens
/llms-full.txt
40,087 tokens
AI & Machine Learning

Shop Dell's laptops, Monitors, Computers, Storage Solutions &amp; Servers for your home and business. Buy online!

/llms.txt
8,533 tokens
AI & Machine Learning

The AI Code Editor

/llms.txt
4,643 tokens
/llms-full.txt
95,589 tokens
AI & Machine Learning

Create the most realistic speech with our AI audio in 1000s of voices and 32 languages. Pioneering research in Text to Speech and AI Voice Generation

/llms.txt
23,168 tokens
/llms-full.txt
1,020,683 tokens
AI & Machine Learning

Get started with the Model Context Protocol (MCP).

/llms.txt
3,315 tokens
/llms-full.txt
223,365 tokens
AI & Machine Learning

The official Python client for the Huggingface Hub.

/llms.txt
143,619 tokens
AI & Machine Learning

The Voice AI Platform: TTS Models, Voice Agents, & More.

/llms.txt
1,738 tokens
/llms-full.txt
56,923 tokens
AI & Machine Learning