Cog

cog.run
Developer Tools

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.

llms.txt

Cog: Containers for machine learning

Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.

You can deploy your packaged model to your own infrastructure, or to Replicate.

Highlights

  • 📦 Docker containers without the pain. Writing your own Dockerfile can be a bewildering process. With Cog, you define your environment with a simple configuration file and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on.

  • 🤬️ No more CUDA hell. Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.

  • Define the inputs and outputs for your model with standard Python. Then, Cog generates an OpenAPI schema and validates the inputs and outputs.

  • 🎁 Automatic HTTP inference server: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server.

  • 🚀 Ready for production. Deploy your model anywhere that Docker images run. Your own infrastructure, or Replicate.

How it works

Define the Docker environment your model runs in with cog.yaml:

build:
  gpu: true
  system_packages:
    - "libgl1"
    - "libglib2.0-0"
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"

Define how your model runs with run.py:

from cog import BaseRunner, Input, Path
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.model = torch.load("./weights.pth")

    # The arguments and types the model takes as input
    def run(self,
          image: Path = Input(description="Grayscale input image")
    ) -> Path:
        """Run the model"""
        processed_image = preprocess(image)
        output = self.model(processed_image)
        return postprocess(output)

In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model.

Now, you can run the model:

$ cog run -i image=@input.jpg
--> Building Docker image...
--> Running...
--> Output written to output.jpg

Or, build a Docker image for deployment:

$ cog build -t my-classification-model
--> Building Docker image...
--> Built my-classification-model:latest

$ docker run -d -p 5000:5000 --gpus all my-classification-model

$ curl http://localhost:5000/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://.../input.jpg"}}'

Or, combine build and run via the serve command:

$ cog serve -p 8080

$ curl http://localhost:8080/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://.../input.jpg"}}'

Why are we building this?

It's really hard for researchers to ship machine learning models to production.

Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed.

Andreas and Ben created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created Docker Compose.

We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. Uber and others have built similar systems. So, we're making an open source version so other people can do this too.

Hit us up if you're interested in using it or want to collaborate with us. We're on Discord or email us at team@replicate.com.

Prerequisites

  • macOS, Linux or Windows 11. Cog works on macOS, Linux and Windows 11 with WSL 2
  • Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to install Buildx as well.

Install

If you're using macOS, you can install Cog using Homebrew:

brew install replicate/tap/cog

You can also download and install the latest release using our install script:

# bash, zsh, and other shells
sh <(curl -fsSL https://cog.run/install.sh)

# fish shell
sh (curl -fsSL https://cog.run/install.sh | psub)

# download with wget and run in a separate command
wget -qO- https://cog.run/install.sh
sh ./install.sh

You can manually install the latest release of Cog directly from GitHub by running the following commands in a terminal:

sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog

Or if you are on docker:

RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)"

Upgrade

If you're using macOS and you previously installed Cog with Homebrew, run the following:

brew upgrade replicate/tap/cog

Otherwise, you can upgrade to the latest version by running the same commands you used to install it.

Development

See CONTRIBUTING.md for how to set up a development environment and build from source.

Next steps

Need help?

Join us in #cog on Discord.

Ask DeepWiki


CLI reference

cog

Containers for machine learning.

To get started, take a look at the documentation: https://github.com/replicate/cog

Examples

   To execute a command inside a Docker environment defined with Cog:
      $ cog exec echo hello world

Options

      --debug      Show debugging output
  -h, --help       help for cog
      --no-color   Disable colored output
      --version    Show version of Cog

cog build

Build a Docker image from the cog.yaml in the current directory.

The generated image contains your model code, dependencies, and the Cog runtime. It can be run locally with 'cog run' or pushed to a registry with 'cog push'.

cog build [flags]

Examples

  # Build with default settings
  cog build

  # Build and tag the image
  cog build -t my-model:latest

  # Build without using the cache
  cog build --no-cache

  # Build with model weights in a separate layer
  cog build --separate-weights -t my-model:v1

Options

  -f, --file string                  The name of the config file. (default "cog.yaml")
  -h, --help                         help for build
      --no-cache                     Do not use cache when building the image
      --openapi-schema string        Load OpenAPI schema from a file
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --secret stringArray           Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
      --separate-weights             Separate model weights from code in image layers
  -t, --tag string                   A name for the built image in the form 'repository:tag'
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")

cog doctor

Diagnose and fix common issues in your Cog project.

NOTE: cog doctor is experimental. Behavior and checks may change in future versions.

By default, cog doctor reports problems without modifying any files. Pass --fix to automatically apply safe fixes.

cog doctor [flags]

Options

  -f, --file string   The name of the config file. (default "cog.yaml")
      --fix           Automatically apply fixes
  -h, --help          help for doctor

cog exec

Execute a command inside a Docker environment defined by cog.yaml.

Cog builds a temporary image from your cog.yaml configuration and runs the given command inside it. This is useful for debugging, running scripts, or exploring the environment your model will run in.

cog exec <command> [arg...] [flags]

Examples

  # Open a Python interpreter inside the model environment
  cog exec python

  # Run a script
  cog exec python train.py

  # Run with environment variables
  cog exec -e HUGGING_FACE_HUB_TOKEN=abc123 python download.py

  # Expose a port (e.g. for Jupyter)
  cog exec -p 8888 jupyter notebook

Options

  -e, --env stringArray              Environment variables, in the form name=value
  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for exec
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
  -p, --publish stringArray          Publish a container's port to the host, e.g. -p 8000
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")

cog init

Create a cog.yaml and run.py in the current directory.

These files provide a starting template for defining your model's environment and run interface. Edit them to match your model's requirements.

cog init [flags]

Examples

  # Set up a new Cog project in the current directory
  cog init

Options

  -h, --help   help for init

cog login

Log in to a container registry.

For Replicate's registry (r8.im), this command handles authentication through Replicate's token-based flow.

For other registries, this command prompts for username and password, then stores credentials using Docker's credential system.

cog login [flags]

Options

  -h, --help          help for login
      --token-stdin   Pass login token on stdin instead of opening a browser. You can find your Replicate login token at https://replicate.com/auth/token

cog push

Build a Docker image from cog.yaml and push it to a container registry.

Cog can push to any OCI-compliant registry. When pushing to Replicate's registry (r8.im), run 'cog login' first to authenticate.

cog push [IMAGE] [flags]

Examples

  # Push to Replicate
  cog push r8.im/your-username/my-model

  # Push to any OCI registry
  cog push registry.example.com/your-username/model-name

  # Push with model weights in a separate layer (Replicate only)
  cog push r8.im/your-username/my-model --separate-weights

Options

  -f, --file string                  The name of the config file. (default "cog.yaml")
  -h, --help                         help for push
      --no-cache                     Do not use cache when building the image
      --openapi-schema string        Load OpenAPI schema from a file
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --secret stringArray           Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
      --separate-weights             Separate model weights from code in image layers
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")

cog run

Run the model.

If 'image' is passed, it will run the model on that Docker image. It must be an image that has been built by Cog.

Otherwise, it will build the model in the current directory and run it.

cog run [image] [flags]

Examples

  # Run the model with named inputs
  cog run -i prompt="a photo of a cat"

  # Pass a file as input
  cog run -i image=@photo.jpg

  # Save output to a file
  cog run -i image=@input.jpg -o output.png

  # Pass multiple inputs
  cog run -i prompt="sunset" -i width=1024 -i height=768

  # Run against a pre-built image
  cog run r8.im/your-username/my-model -i prompt="hello"

  # Pass inputs as JSON
  echo '{"prompt": "a cat"}' | cog run --json @-

Options

  -e, --env stringArray              Environment variables, in the form name=value
  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for run
  -i, --input stringArray            Inputs, in the form name=value. if value is prefixed with @, then it is read from a file on disk. E.g. -i path=@image.jpg
      --json string                  Pass inputs as JSON object, read from file (@inputs.json) or via stdin (@-)
  -o, --output string                Output path
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --setup-timeout uint32         The timeout for a container to setup (in seconds). (default 300)
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
      --use-replicate-token          Pass REPLICATE_API_TOKEN from local environment into the model context

cog serve

Run an HTTP server.

Builds the model and starts an HTTP server that exposes the model's inputs and outputs as a REST API. Compatible with the Cog HTTP protocol.

cog serve [flags]

Examples

  # Start the server on the default port (8393)
  cog serve

  # Start on a custom port
  cog serve -p 5000

  # Test the server
  curl http://localhost:8393/predictions \
    -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"prompt": "a cat"}}'

Options

  -f, --file string                  The name of the config file. (default "cog.yaml")
      --gpus docker run --gpus       GPU devices to add to the container, in the same format as docker run --gpus.
  -h, --help                         help for serve
  -p, --port int                     Port on which to listen (default 8393)
      --progress string              Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
      --upload-url string            Upload URL for file outputs (e.g. https://example.com/upload/)
      --use-cog-base-image           Use pre-built Cog base image for faster cold boots (default true)
      --use-cuda-base-image string   Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")

Deploy models with Cog

Cog containers are Docker containers that serve an HTTP server for running your model. You can deploy them anywhere that Docker containers run.

The server inside Cog containers is coglet, a Rust-based inference server that handles HTTP requests, worker process management, and run execution.

This guide assumes you have a model packaged with Cog. If you don't, follow our getting started guide, or use an example model.

Getting started

First, build your model:

cog build -t my-model

You can serve your model locally with cog serve:

cog serve
# or, from a built image:
cog serve my-model

Alternatively, start the Docker container directly:

# If your model uses a CPU:
docker run -d -p 5001:5000 my-model

# If your model uses a GPU:
docker run -d -p 5001:5000 --gpus all my-model

The server listens on port 5000 inside the container (mapped to 5001 above).

To view the OpenAPI schema, open localhost:5001/openapi.json in your browser or use cURL to make a request:

curl http://localhost:5001/openapi.json

To stop the server, run:

docker kill my-model

To run the model, call the /predictions endpoint, passing input in the format expected by your model:

curl http://localhost:5001/predictions -X POST \
    --header "Content-Type: application/json" \
    --data '{"input": {"image": "https://.../input.jpg"}}'

For more details about the HTTP API, see the HTTP API reference documentation.

Health checks

The server exposes a GET /health-check endpoint that returns the current status of the model container. Use this for readiness probes in orchestration systems like Kubernetes.

curl http://localhost:5001/health-check

The response includes a status field with values like STARTING, READY, BUSY, SETUP_FAILED, or DEFUNCT. See the HTTP API reference for full details.

Concurrency

By default, the server processes one run at a time. To enable concurrent runs, set the concurrency.max option in cog.yaml:

concurrency:
  max: 4

See the cog.yaml reference for more details.

Environment variables

You can configure runtime behavior with environment variables:

  • COG_SETUP_TIMEOUT: Maximum time in seconds for the setup() method (default: no timeout).

See the environment variables reference for the full list.


Environment variables

This reference lists the public Cog-specific environment variables that change how Cog behaves.

Build-time variables

COG_SDK_WHEEL

Controls which Cog Python SDK wheel is installed in the Docker image during cog build. Takes precedence over build.sdk_version in cog.yaml.

Supported values:

ValueDescription
pypiInstall latest version from PyPI
pypi:0.12.0Install specific version from PyPI
distUse wheel from dist/ directory (requires git repo)
https://...Install from URL
/path/to/wheel.whlInstall from local file path

Default behaviour:

  • Release builds install the latest Cog SDK from PyPI.
  • Development builds auto-detect a wheel in dist/, then fall back to the latest Cog SDK from PyPI.
$ COG_SDK_WHEEL=pypi:0.11.0 cog build
$ COG_SDK_WHEEL=dist cog build
$ COG_SDK_WHEEL=https://example.com/cog-0.12.0-py3-none-any.whl cog build

The dist option searches for wheels in:

  1. ./dist/ (current directory)
  2. $REPO_ROOT/dist/ (if REPO_ROOT is set)
  3. <git-repo-root>/dist/ (via git rev-parse, useful when running from subdirectories)

COGLET_WHEEL

Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based inference server.

Supported values: Same as COG_SDK_WHEEL.

Default behaviour: For development builds, auto-detects a wheel in dist/. For release builds, installs the latest version from PyPI.

$ COGLET_WHEEL=dist cog build
$ COGLET_WHEEL=pypi:0.1.0 cog build

COG_CA_CERT

Injects a custom CA certificate into the Docker image during cog build. This is useful when building behind a corporate proxy or VPN that uses custom certificate authorities (for example, Cloudflare WARP).

Supported values:

ValueDescription
/path/to/cert.crtPath to a single PEM certificate file
/path/to/certs/Directory of .crt and .pem files (all are concatenated)
-----BEGIN CERTIFICATE-----...Inline PEM certificate
LS0tLS1CRUdJTi...Base64-encoded PEM certificate

The certificate is installed into the system CA store and the SSL_CERT_FILE and REQUESTS_CA_BUNDLE environment variables are set automatically in the built image.

$ COG_CA_CERT=/usr/local/share/ca-certificates/corporate-ca.crt cog build
$ COG_CA_CERT=/etc/custom-certs/ cog build
$ COG_CA_CERT="$(cat /path/to/cert.pem)" cog build

COG_OPENAPI_SCHEMA

Uses a pre-built OpenAPI schema instead of generating one from the configured predict or train reference.

The value must be a path to a JSON schema file. Cog reads that file during schema generation and embeds it in the built image.

$ COG_OPENAPI_SCHEMA=./openapi.json cog build

CLI and local cache variables

COG_NO_UPDATE_CHECK

Disables Cog's automatic update check. Set it to any non-empty value.

$ COG_NO_UPDATE_CHECK=1 cog build

COG_NO_COLOR

Disables coloured CLI output. Set it to any non-empty value.

Cog also honours the standard NO_COLOR environment variable.

$ COG_NO_COLOR=1 cog predict -i prompt="hello"

COG_SKIP_DOCKER_CHECK

Skips the cog doctor Docker environment check. Set it to any non-empty value.

$ COG_SKIP_DOCKER_CHECK=1 cog doctor

COG_CACHE_DIR

Overrides Cog's local cache root.

Cog currently uses this cache for the content-addressed weights store. If unset, Cog uses $XDG_CACHE_HOME/cog when XDG_CACHE_HOME is set, otherwise $HOME/.cache/cog.

$ COG_CACHE_DIR=/mnt/fast-cache cog weights pull

Model reference and registry variables

COG_MODEL

Overrides the full model reference used by commands that need a model destination, such as cog push and weights commands.

The value is parsed as a complete model reference (registry/repo, registry/repo:tag, or registry/repo@digest). If no tag is supplied, Cog generates a timestamp tag.

When COG_MODEL is set, it takes precedence over COG_MODEL_REGISTRY, COG_MODEL_REPO, and COG_MODEL_TAG.

$ COG_MODEL=r8.im/acme/my-model:v1 cog push

COG_MODEL_REGISTRY

Overrides only the registry host of the model reference.

$ COG_MODEL_REGISTRY=registry.example.com cog push

COG_MODEL_REPO

Overrides only the repository path of the model reference. The value must not include a registry host, tag, or digest.

$ COG_MODEL_REPO=acme/my-model cog push

COG_MODEL_TAG

Overrides only the tag of the model reference.

Tags starting with cog- are reserved for tags that Cog generates internally and are rejected.

$ COG_MODEL_TAG=staging cog push

COG_REGISTRY_HOST

Changes the default Replicate-compatible registry host used by commands such as cog login, base image resolution, and model reference resolution.

The default is r8.im.

$ COG_REGISTRY_HOST=registry.example.com cog login

Runtime server variables

These variables affect a running model server. Set them in cog.yaml under environment, pass them with cog predict -e or cog serve -e, or set them when running the built Docker image.

COG_MAX_CONCURRENCY

Controls how many predictions the model server can run concurrently.

By default, Cog runs one prediction at a time. Invalid values are ignored and the default of 1 is used.

$ COG_MAX_CONCURRENCY=4 docker run -p 5000:5000 my-model

COG_SETUP_TIMEOUT

Controls the maximum time, in seconds, allowed for the model's setup() method to complete. If setup exceeds this timeout, the server reports setup failure.

By default, there is no timeout. Set to 0 to disable the timeout. Invalid values are ignored with a warning.

$ COG_SETUP_TIMEOUT=300 docker run -p 5000:5000 my-model

COG_LOG_LEVEL

Controls Coglet runtime log verbosity when RUST_LOG is not set.

Supported values are debug, info, warn, warning, and error. The default is info.

$ COG_LOG_LEVEL=debug docker run -p 5000:5000 my-model

COG_THROTTLE_RESPONSE_INTERVAL

Controls how often asynchronous webhook output and logs events are sent, in seconds.

The default is 0.5 seconds. Invalid values are ignored and the default is used. start and completed webhook events are always sent immediately.

$ COG_THROTTLE_RESPONSE_INTERVAL=1 docker run -p 5000:5000 my-model

COG_STREAM_HISTORY_CAPACITY

Controls how many server-sent event stream events are retained per prediction for replay when a client reconnects with Accept: text/event-stream.

By default, Cog retains the most recent 1024 events per prediction. Set to 0 to disable replay history while keeping live streaming enabled. Invalid values are ignored with a warning and the default is used.

$ COG_STREAM_HISTORY_CAPACITY=0 docker run -p 5000:5000 my-model
$ COG_STREAM_HISTORY_CAPACITY=4096 docker run -p 5000:5000 my-model

COG_WEIGHTS

Provides a weights path or URL to a model whose setup() method accepts a weights parameter.

$ cog run -e COG_WEIGHTS=https://example.com/weights.tar -i prompt="hello"

COG_USER_AGENT

Sets the User-Agent header used by Cog when downloading URL-backed File inputs.

$ COG_USER_AGENT="my-service/1.0" docker run -p 5000:5000 my-model

Push tuning variables

COG_PUSH_OCI

Enables Cog's OCI chunked push path for container image layers when set to 1. If the OCI push fails with a non-fatal error, Cog falls back to Docker's native push path.

$ COG_PUSH_OCI=1 cog push

COG_PUSH_CONCURRENCY

Controls how many image layers or weight blobs Cog uploads concurrently during push operations.

The default is 5. Invalid values and values less than 1 are ignored.

$ COG_PUSH_CONCURRENCY=2 cog push

COG_PUSH_DEFAULT_CHUNK_SIZE

Sets the default multipart upload chunk size, in bytes, when the registry does not advertise a maximum chunk size.

The default is 96 MiB. Invalid values and values less than 1 are ignored.

$ COG_PUSH_DEFAULT_CHUNK_SIZE=67108864 cog push

COG_PUSH_MULTIPART_THRESHOLD

Sets the minimum blob size, in bytes, before Cog uses multipart upload.

The default is 128 MiB. Invalid values and values less than 1 are ignored.

$ COG_PUSH_MULTIPART_THRESHOLD=268435456 cog push

Getting started with your own model

This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the main getting started guide.

Prerequisites

  • macOS or Linux. Cog works on macOS and Linux, but does not currently support Windows.
  • Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog.

Initialization

First, install Cog if you haven't already:

macOS (recommended):

brew install replicate/tap/cog

Linux or macOS (manual):

sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog

To configure your project for use with Cog, you'll need to add two files:

  • cog.yaml defines system requirements, Python package dependencies, etc
  • run.py describes the run interface for your model

Use the cog init command to generate these files in your project:

$ cd path/to/your/model
$ cog init

Define the Docker environment

The cog.yaml file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image.

For example:

build:
  python_version: "3.13"
  python_requirements: requirements.txt

With a requirements.txt containing your dependencies:

torch==2.6.0

This will generate a Docker image with Python 3.13 and PyTorch 2 installed, for both CPU and GPU, with the correct version of CUDA, and various other sensible best-practices.

To run a command inside this environment, prefix it with cog exec:

$ cog exec python
✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
────────────────────────────────────────────────────────────────────────────────────────

Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

This is handy for ensuring a consistent environment for development or training.

With cog.yaml, you can also install system packages and other things. Take a look at the full reference to see what else you can do.

Define how to run your model

The next step is to update run.py to define the interface for running your model. The run.py generated by cog init looks something like this:

from cog import BaseRunner, Path, Input
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.net = torch.load("weights.pth")

    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5)
    ) -> Path:
        """Run the model"""
        # ... pre-processing ...
        output = self.net(input)
        # ... post-processing ...
        return output

Edit your run.py file and fill in the functions with your own model's setup and run code. You might need to import parts of your model from another file.

You also need to define the inputs to your model as arguments to the run() function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:

  • str: a string
  • int: an integer
  • float: a floating point number
  • bool: a boolean
  • cog.File: a file-like object representing a file (deprecated — use cog.Path instead)
  • cog.Path: a path to a file on disk

You can provide more information about the input with the Input() function, as shown above. It takes these basic arguments:

  • description: A description of what to pass to this input for users of the model
  • default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to None, the input is optional.
  • ge: For int or float types, the value should be greater than or equal to this number.
  • le: For int or float types, the value should be less than or equal to this number.
  • min_length: For str types, the minimum length of the string.
  • max_length: For str types, the maximum length of the string.
  • regex: For str types, the string must match this regular expression.
  • choices: For str or int types, a list of possible values for this input.
  • deprecated: Mark this input as deprecated with a message explaining what to use instead.

There are some more advanced options you can pass, too. For more details, take a look at the run interface documentation.

Next, add the line run: "run.py:Runner" to your cog.yaml, so it looks something like this:

build:
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"

That's it! To test this works, try running the model:

$ cog run -i image=@input.jpg
✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4
✓ Model running in Docker image 664ef88bc1f4

Written output to output.png

To pass more inputs to the model, you can add more -i options:

$ cog run -i image=@image.jpg -i scale=2.0

In this case it is just a number, not a file, so you don't need the @ prefix.

Using GPUs

To use GPUs with Cog, add the gpu: true option to the build section of your cog.yaml:

build:
  gpu: true
  ...

Cog will use the nvidia-docker base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.

For more details, see the gpu section of the cog.yaml reference.

Next steps

Next, you might want to take a look at:


Getting started

This guide will walk you through what you can do with Cog by using an example model.

[!TIP] Using a language model to help you write the code for your new Cog model?

Feed it https://cog.run/llms.txt, which has all of Cog's documentation bundled into a single file. To learn more about this format, check out llmstxt.org.

Prerequisites

  • macOS or Linux. Cog works on macOS and Linux, but does not currently support Windows.
  • Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog.

Install Cog

macOS (recommended):

brew install replicate/tap/cog

Linux or macOS (manual):

sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
sudo xattr -d com.apple.quarantine /usr/local/bin/cog 2>/dev/null || true

[!NOTE] macOS: "cannot be opened because the developer cannot be verified"

If you downloaded the binary manually (via curl or a browser) and see this Gatekeeper warning, run:

sudo xattr -d com.apple.quarantine /usr/local/bin/cog

Installing via brew install replicate/tap/cog handles this automatically.

Create a project

Let's make a directory to work in:

mkdir cog-quickstart
cd cog-quickstart

Run commands

The simplest thing you can do with Cog is run a command inside a Docker environment.

The first thing you need to do is create a file called cog.yaml:

build:
  python_version: "3.13"

Then, you can run any command inside this environment. For example, enter

cog exec python

and you'll get an interactive Python shell:

✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

(Hit Ctrl-D to exit the Python shell.)

Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.

Run a model

Let's pretend we've trained a model. With Cog, we can define how to run it in a standard way, so other people can easily run it without having to hunt around for a run script.

We need to write some code to describe how the model runs.

Save this to run.py:

import os
os.environ["TORCH_HOME"] = "."

import torch
from cog import BaseRunner, Input, Path
from PIL import Image
from torchvision import models

WEIGHTS = models.ResNet50_Weights.IMAGENET1K_V1


class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.model = models.resnet50(weights=WEIGHTS).to(self.device)
        self.model.eval()

    def run(self, image: Path = Input(description="Image to classify")) -> dict:
        """Run the model"""
        img = Image.open(image).convert("RGB")
        preds = self.model(WEIGHTS.transforms()(img).unsqueeze(0).to(self.device))
        top3 = preds[0].softmax(0).topk(3)
        categories = WEIGHTS.meta["categories"]
        return {categories[i]: p.detach().item() for p, i in zip(*top3)}

We also need to point Cog at this, and tell it what Python dependencies to install.

Save this to requirements.txt:

pillow==11.1.0
torch==2.6.0
torchvision==0.21.0

Then update cog.yaml to look like this:

build:
  python_version: "3.13"
  python_requirements: requirements.txt
run: "run.py:Runner"

[!TIP] If you have a machine with an NVIDIA GPU attached, add gpu: true to the build section of your cog.yaml to enable GPU acceleration.

Let's grab an image to test the model with:

IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg
curl $IMAGE_URL > input.jpg

Now, let's run the model using Cog:

cog run -i image=@input.jpg

If you see the following output

{
  "tiger_cat": 0.4874822497367859,
  "tabby": 0.23169134557247162,
  "Egyptian_cat": 0.09728282690048218
}

then it worked!

Note: The first time you run cog run, the build process will be triggered to generate a Docker container that can run your model. The next time you run cog run the pre-built container will be used.

Build an image

We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time inference.

cog build -t resnet
# Building Docker image...
# Built resnet:latest

You can run this image with cog run by passing the filename as an argument:

cog run resnet -i image=@input.jpg

Or, you can run it with Docker directly, and it'll serve an HTTP server:

docker run -d --rm -p 5000:5000 resnet

We can send inputs directly with curl:

curl http://localhost:5000/predictions -X POST \
    -H 'Content-Type: application/json' \
    -d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}'

As a shorthand, you can add the Docker image's name as an extra line in cog.yaml:

image: "r8.im/replicate/resnet"

Once you've done this, you can use cog push to build and push the image to a Docker registry:

cog push
# Building r8.im/replicate/resnet...
# Pushing r8.im/replicate/resnet...
# Pushed!

The Docker image is now accessible to anyone or any system that has access to this Docker registry.

Next steps

Those are the basics! Next, you might want to take a look at:


HTTP API

[!TIP] For information about how to run the HTTP server, see our documentation on deploying models.

When you run a Docker image built by Cog, it serves an HTTP API for making predictions.

The server supports both synchronous and asynchronous prediction creation:

  • Synchronous: The server waits until the prediction is completed and responds with the result.
  • Asynchronous: The server immediately returns a response and processes the prediction in the background.

The client can create a prediction asynchronously by setting the Prefer: respond-async header in their request or by requesting a streamed response with Accept: text/event-stream. With Prefer: respond-async, the server responds immediately after starting the prediction with 202 Accepted status and a prediction object in status starting. With Accept: text/event-stream, the server responds with 200 OK and keeps the response open as a server-sent event stream.

[!NOTE] For JSON responses, the only supported way to receive updates on the status of predictions started asynchronously is using webhooks. Polling for prediction status is not currently supported.

You can also use certain server endpoints to create predictions idempotently, such that if a client calls this endpoint more than once with the same ID (for example, due to a network interruption) while the prediction is still running, no new prediction is created. Instead, the client receives the response type requested by the retry: JSON for regular requests or a server-sent event stream for streaming requests.


Here's a summary of the prediction creation endpoints:

EndpointHeaderBehavior
POST /predictions-Synchronous, non-idempotent
POST /predictionsPrefer: respond-asyncAsynchronous, non-idempotent
POST /predictionsAccept: text/event-streamStreaming, non-idempotent
PUT /predictions/<prediction_id>-Synchronous, idempotent
PUT /predictions/<prediction_id>Prefer: respond-asyncAsynchronous, idempotent
PUT /predictions/<prediction_id>Accept: text/event-streamStreaming, idempotent

Choose the endpoint that best fits your needs:

  • Use synchronous endpoints when you want to wait for the prediction result.
  • Use asynchronous endpoints when you want to start a prediction and receive updates via webhooks.
  • Use streaming endpoints when you want to receive prediction lifecycle events over the HTTP response as they happen.
  • Use idempotent endpoints when you need to safely retry requests without creating duplicate predictions.

Streaming predictions with server-sent events

To produce streamed prediction events, the model must return an iterator and opt in to SSE streaming with the streaming decorator.

from typing import Iterator

from cog import BaseRunner, Input, streaming


class Runner(BaseRunner):
    @streaming
    def run(self, prompt: str = Input(description="Prompt")) -> Iterator[str]:
        for token in generate_tokens(prompt):
            yield token

The decorator can also be written as @cog.streaming or, if imported directly from cog, @streaming. The parenthesized forms @cog.streaming() and @streaming() are also accepted. Without the decorator, iterator outputs still work in normal JSON responses, but requests with Accept: text/event-stream return 406 Not Acceptable.

To consume a streamed prediction, send the prediction request with Accept: text/event-stream:

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream

{
    "input": {"prompt": "Write a haiku about onions"}
}

The server starts the prediction asynchronously and keeps the HTTP response open as a server-sent event stream. Each event has an event name and JSON data payload:

event: start
data: {"id":"abc123","status":"processing"}

event: output
data: {"chunk":"Onions","index":0}

event: output
data: {"chunk":" bloom","index":1}

event: completed
data: {"id":"abc123","status":"succeeded","output":["Onions"," bloom"],"metrics":{"predict_time":0.42}}

Prediction streams can emit these event types:

  • start: The prediction started processing.
  • output: The model yielded an output chunk. The payload includes chunk and index.
  • log: The model wrote to stdout or stderr. The payload includes source and data.
  • metric: The model recorded a custom metric. The payload includes name, value, and mode.
  • completed: The prediction reached a terminal state. The payload is the final prediction object, with status set to succeeded, failed, or canceled.

For command-line clients, use a client that prints the response as data arrives:

curl -N \
  -H 'Accept: text/event-stream' \
  -H 'Content-Type: application/json' \
  -d '{"input":{"prompt":"Write a haiku about onions"}}' \
  http://localhost:5000/predictions

For browser clients, use fetch() or another client that supports request bodies. The browser EventSource API only supports GET requests, so it cannot create a prediction with POST /predictions or PUT /predictions/<prediction_id>.

const response = await fetch("/predictions", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Accept: "text/event-stream",
  },
  body: JSON.stringify({ input: { prompt: "Write a haiku about onions" } }),
});

const reader = response.body.pipeThrough(new TextDecoderStream()).getReader();

while (true) {
  const { value, done } = await reader.read();
  if (done) break;
  console.log(value);
}

Use PUT /predictions/<prediction_id> when the client needs safe retries or wants to reconnect to an in-flight prediction by ID:

PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream

{
    "input": {"prompt": "Write a haiku about onions"}
}

If the prediction is still running, the server returns a stream for the existing prediction instead of creating a duplicate prediction. If earlier events have been dropped from the replay buffer, the stream emits an error event and closes. The replay buffer keeps the most recent 1024 events by default. Set COG_STREAM_HISTORY_CAPACITY to change this limit, or set it to 0 to disable replay history while keeping live streaming enabled. Training endpoints do not support SSE streaming; requests to /trainings with Accept: text/event-stream return 406 Not Acceptable.

Webhooks

You can provide a webhook parameter in the client request body when creating a prediction.

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "webhook": "https://example.com/webhook/prediction"
}

The server makes requests to the provided URL with the current state of the prediction object in the request body at the following times.

  • start: Once, when the prediction starts (status is starting).
  • output: Each time a run function generates an output (either once using return or multiple times using yield)
  • logs: Each time the run function writes to stdout
  • completed: Once, when the prediction reaches a terminal state (status is succeeded, canceled, or failed)

Webhook requests for start and completed event types are sent immediately. Webhook requests for output and logs event types are sent at most once every 500ms. This interval is not configurable.

By default, the server sends requests for all event types. Clients can specify which events trigger webhook requests with the webhook_events_filter parameter in the prediction request body. For example, the following request specifies that webhooks are sent by the server only at the start and end of the prediction:

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "webhook": "https://example.com/webhook/prediction",
    "webhook_events_filter": ["start", "completed"]
}

Generating unique prediction IDs

Endpoints for creating and canceling a prediction idempotently accept a prediction_id parameter in their path. By default, the server runs one prediction at a time, but this can be increased with the concurrency.max setting. When all prediction slots are in use, the server returns 409 Conflict. The client should ensure prediction slots are available before creating a new prediction with a different ID.

Clients are responsible for providing unique prediction IDs. We recommend generating a UUIDv4 or UUIDv7, base32-encoding that value, and removing padding characters (==). This produces a random identifier that is 26 ASCII characters long.

>> from uuid import uuid4
>> from base64 import b32encode
>> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=')
'wjx3whax6rf4vphkegkhcvpv6a'

File uploads

A model's run function can produce file output by yielding or returning a cog.Path or cog.File value.

By default, files are returned as a base64-encoded data URL.

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
}
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,..."
}

When creating a prediction synchronously, the client can configure a base URL to upload output files to instead by setting the output_file_prefix parameter in the request body:

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"},
    "output_file_prefix": "https://example.com/upload",
}

When the model produces a file output, the server sends the following request to upload the file to the configured URL:

PUT /upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data

--boundary
Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png

<binary data>
--boundary--

If the upload succeeds, the server responds with output:

HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "http://example.com/upload/image.png"
}

If the upload fails, the server responds with an error.

[!IMPORTANT]
File uploads for predictions created asynchronously require --upload-url to be specified when starting the HTTP server.

Endpoints

GET /

Returns a discovery document listing available API endpoints, the OpenAPI schema URL, and version information.

GET / HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
    "cog_version": "0.17.0",
    "docs_url": "/docs",
    "openapi_url": "/openapi.json",
    "shutdown_url": "/shutdown",
    "healthcheck_url": "/health-check",
    "predictions_url": "/predictions",
    "predictions_idempotent_url": "/predictions/{prediction_id}",
    "predictions_cancel_url": "/predictions/{prediction_id}/cancel"
}

If training is configured, the response also includes trainings_url, trainings_idempotent_url, and trainings_cancel_url fields.

GET /health-check

Returns the current health status of the model container. This endpoint always responds with 200 OK — check the status field in the response body to determine readiness.

The response body is a JSON object with the following fields:

  • status: One of the following values:
    • STARTING: The model's setup() method is still running.
    • READY: The model is ready to accept predictions.
    • BUSY: The model is ready but all prediction slots are in use.
    • SETUP_FAILED: The model's setup() method raised an exception.
    • DEFUNCT: The model encountered an unrecoverable error.
    • UNHEALTHY: The model is ready but a user-defined healthcheck() method returned False.
  • setup: Setup phase details (included once setup has started):
    • started_at: ISO 8601 timestamp of when setup began.
    • completed_at: ISO 8601 timestamp of when setup finished (if complete).
    • status: One of starting, succeeded, or failed.
    • logs: Output captured during setup.
  • version: Runtime version information:
    • coglet: Coglet version.
    • cog: Cog Python SDK version (if available).
    • python: Python version (if available).
  • user_healthcheck_error: Error message from a user-defined healthcheck() method (if applicable).
GET /health-check HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "READY",
    "setup": {
        "started_at": "2025-01-01T00:00:00.000000+00:00",
        "completed_at": "2025-01-01T00:00:05.000000+00:00",
        "status": "succeeded",
        "logs": ""
    },
    "version": {
        "coglet": "0.17.0",
        "cog": "0.14.0",
        "python": "3.13.0"
    }
}

GET /openapi.json

The OpenAPI specification of the API, which is derived from the input and output types specified in your model's Predictor and Training objects.

POST /predictions

Makes a single prediction.

The request body is a JSON object with the following fields:

The response body is a JSON object with the following fields:

  • status: Either succeeded or failed.
  • output: The return value of the run() function.
  • error: If status is failed, the error message.
  • metrics: An object containing prediction metrics. Always includes predict_time (elapsed seconds). May also include custom metrics recorded by the model using self.record_metric().
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {
        "image": "https://example.com/image.jpg",
        "text": "Hello world!"
    }
}
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,...",
    "metrics": {
        "predict_time": 4.52
    }
}

If the client sets the Prefer: respond-async header in their request, the server responds immediately after starting the prediction with 202 Accepted status and a prediction object in status processing.

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 202 Accepted
Content-Type: application/json

{
    "status": "starting",
}

If the client sets the Accept: text/event-stream header, the server starts the prediction asynchronously and responds with a server-sent event stream. See Streaming predictions with server-sent events.

PUT /predictions/<prediction_id>

Make a single prediction. This is the idempotent version of the POST /predictions endpoint.

PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 200 OK
Content-Type: application/json

{
    "status": "succeeded",
    "output": "data:image/png;base64,..."
}

If the client sets the Prefer: respond-async header in their request, the server responds immediately after starting the prediction with 202 Accepted status and a prediction object in status processing.

PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 202 Accepted
Content-Type: application/json

{
    "id": "wjx3whax6rf4vphkegkhcvpv6a",
    "status": "starting"
}

If the client sets the Accept: text/event-stream header, the server starts the prediction asynchronously and responds with a server-sent event stream. If a prediction with the same ID is already running, the server returns a stream for the existing prediction. See Streaming predictions with server-sent events.

POST /predictions/<prediction_id>/cancel

A client can cancel an asynchronous prediction by making a POST /predictions/<prediction_id>/cancel request using the prediction id provided when the prediction was created.

For example, if the client creates a prediction by sending the request:

POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async

{
    "id": "abcd1234",
    "input": {"prompt": "A picture of an onion with sunglasses"},
}

The client can cancel the prediction by sending the request:

POST /predictions/abcd1234/cancel HTTP/1.1

A prediction cannot be canceled if it's created synchronously, without the Prefer: respond-async header, or created without a provided id.

If a prediction exists with the provided id, the server responds with status 200 OK. Otherwise, the server responds with status 404 Not Found.

When a prediction is canceled, Cog raises CancelationException in sync predictors (or asyncio.CancelledError in async predictors). This exception may be caught by the model to perform necessary cleanup. The cleanup should be brief, ideally completing within a few seconds. After cleanup, the exception must be re-raised using a bare raise statement. Failure to re-raise the exception may result in the termination of the container.

from cog import BaseRunner, CancelationException, Input, Path

class Runner(BaseRunner):
    def run(self, image: Path = Input(description="Image to process")) -> Path:
        try:
            return self.process(image)
        except CancelationException:
            self.cleanup()
            raise  # always re-raise

Notebooks

Cog plays nicely with Jupyter notebooks.

Install the jupyterlab Python package

First, add jupyterlab to your requirements.txt file and reference it in cog.yaml:

requirements.txt:

jupyterlab

cog.yaml:

build:
  python_requirements: requirements.txt

Run a notebook

Cog can run notebooks in the environment you've defined in cog.yaml with the following command:

cog exec -p 8888 jupyter lab --allow-root --ip=0.0.0.0

Use notebook code in your runner

You can also import a notebook into your Cog Runner file.

First, export your notebook to a Python file:

jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py

Then import the exported Python script into your run.py file. Any functions or variables defined in your notebook will be available to your runner:

from cog import BaseRunner, Input

import my_notebook

class Runner(BaseRunner):
    def run(self, prompt: str = Input(description="string prompt")) -> str:
      output = my_notebook.do_stuff(prompt)
      return output

Private package registry

This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup.

pip.conf

In a directory outside your Cog project, create a pip.conf file with an index-url set to the registry's URL with embedded credentials.

[global]
index-url = https://username:password@my-private-registry.com

Warning Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in .gitignore and .dockerignore.

cog.yaml

In your project's cog.yaml file, add a setup command to run pip install with a secret configuration file mounted to /etc/pip.conf.

build:
  run:
    - command: pip install
      mounts:
        - type: secret
          id: pip
          target: /etc/pip.conf

Build

When building or pushing your model with Cog, pass the --secret option with an id matching the one specified in cog.yaml, along with a path to your local pip.conf file.

$ cog build --secret id=pip,source=/path/to/pip.conf

Using a secret mount allows the private registry credentials to be securely passed to the pip install setup command, without baking them into the Docker image.

Warning If you run cog build or cog push and then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change the id value in cog.yaml and the --secret option, or pass the --no-cache option to bypass the cache entirely.


Run interface reference

This document defines the API of the cog Python module, which is used to define the interface for running your model.

[!TIP] Run cog init to generate an annotated run.py file that can be used as a starting point for setting up your model.

[!TIP] Using a language model to help you write the code for your new Cog model?

Feed it https://cog.run/llms.txt, which has all of Cog's documentation bundled into a single file. To learn more about this format, check out llmstxt.org.

Contents

BaseRunner

You define how Cog runs your model by defining a class that inherits from BaseRunner. It looks something like this:

from cog import BaseRunner, Path, Input
import torch

class Runner(BaseRunner):
    def setup(self):
        """Load the model into memory to make running multiple inferences efficient"""
        self.model = torch.load("weights.pth")

    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5)
    ) -> Path:
        """Run the model"""
        # ... pre-processing ...
        output = self.model(image)
        # ... post-processing ...
        return output

Your Runner class should define two methods: setup() and run().

BasePredictor, Predictor, and predict() still work for existing models, but they are deprecated. Cog warns when it loads or inspects those legacy names. Use BaseRunner, Runner, and run() for new code.

Runner.setup()

Prepare the model so multiple runs are efficient.

Use this optional method to include expensive one-off operations like loading trained models, instantiating data transformations, etc.

Many models use this method to download their weights (e.g. using pget). This has some advantages:

  • Smaller image sizes
  • Faster build times
  • Faster pushes and inference on Replicate

However, this may also significantly increase your setup() time.

As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your cog.yaml and ensure they are not excluded in your .dockerignore file.

While this will increase your image size and build time, it offers other advantages:

  • Faster setup() time
  • Ensures idempotency and reduces your model's reliance on external systems
  • Preserves reproducibility as your model will be self-contained in the image

When using this method, you should use the --separate-weights flag on cog build to store weights in a separate layer.

Runner.run(**kwargs)

Run the model.

This required method is where you call the model that was loaded during setup(), but you may also want to add pre- and post-processing code here.

The run() method takes an arbitrary list of named arguments, where each argument name must correspond to an Input() annotation.

run() can return strings, numbers, cog.Path objects representing files on disk, or lists or dicts of those types. You can also define a custom BaseModel for structured return types. See Input and output types for the full list of supported types.

async runners and concurrency

Added in cog 0.14.0.

You may specify your run() method as async def run(...). In addition, if you have an async run() function you may also have an async setup() function:

class Runner(BaseRunner):
    async def setup(self) -> None:
        print("async setup is also supported...")

    async def run(self) -> str:
        print("async run");
        return "hello world";

Models that have an async run() function can run concurrently, up to the limit specified by concurrency.max in cog.yaml. Attempting to exceed this limit will return a 409 Conflict response.

Input(**kwargs)

Use cog's Input() function to define each of the parameters in your run() method:

class Runner(BaseRunner):
    def run(self,
            image: Path = Input(description="Image to enlarge"),
            scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0)
    ) -> Path:

The Input() function takes these keyword arguments:

  • description: A description of what to pass to this input for users of the model.
  • default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to None, the input is optional.
  • ge: For int or float types, the value must be greater than or equal to this number.
  • le: For int or float types, the value must be less than or equal to this number.
  • min_length: For str types, the minimum length of the string.
  • max_length: For str types, the maximum length of the string.
  • regex: For str types, the string must match this regular expression.
  • choices: For str or int types, a list of possible values for this input.
  • deprecated: (optional) If set to True, marks this input as deprecated. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. See Deprecating inputs.

Each parameter of the run() method must be annotated with a type like str, int, float, bool, etc. See Input and output types for the full list of supported types.

Using the Input function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:

class Runner(BaseRunner):
    def run(self,
        prompt: str = "default prompt", # this is valid
        iterations: int                 # also valid
    ) -> str:
        # ...

Deprecating inputs

You can mark an input as deprecated by passing deprecated=True to the Input() function. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future.

This is useful when you want to phase out an input without breaking existing clients immediately:

from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        text: str = Input(description="Some deprecated text", deprecated=True),
        prompt: str = Input(description="Prompt for the model")
    ) -> str:
        # ...
        return prompt

Output

Cog runners can return a simple data type like a string, number, float, or boolean. Use Python's -> <type> syntax to annotate the return type.

Here's an example of a runner that returns a string:

from cog import BaseRunner

class Runner(BaseRunner):
    def run(self) -> str:
        return "hello"

Returning an object

To return a complex object with multiple values, define an Output object with multiple fields to return from your run() method:

from cog import BaseRunner, BaseModel, File

class Output(BaseModel):
    file: File
    text: str

class Runner(BaseRunner):
    def run(self) -> Output:
        return Output(text="hello", file=io.StringIO("hello"))

Each of the output object's properties must be one of the supported output types. For the full list, see Input and output types.

Returning a list

The run() method can return a list of any of the supported output types. Here's an example that outputs multiple files:

from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self) -> list[Path]:
        items = ["foo", "bar", "baz"]
        output = []
        for i, item in enumerate(items):
            out_path = Path(f"/tmp/out-{i}.txt")
            with out_path.open("w") as f:
                f.write(item)
            output.append(out_path)
        return output

Files are named in the format output.<index>.<extension>, e.g. output.0.txt, output.1.txt, and output.2.txt from the example above.

Optional properties

To conditionally omit properties from the Output object, define them using typing.Optional:

from cog import BaseModel, BaseRunner, Path
from typing import Optional

class Output(BaseModel):
    score: Optional[float]
    file: Optional[Path]

class Runner(BaseRunner):
    def run(self) -> Output:
        if condition:
            return Output(score=1.5)
        else:
            return Output(file=io.StringIO("hello"))

Streaming output

Cog models can stream output as the run() method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated.

To support streaming output in your Cog model, add from typing import Iterator to your run.py file. The typing package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the run() method in the form -> Iterator[<type>] where <type> can be one of str, int, float, bool, or cog.Path.

To allow clients to receive chunks as server-sent events with Accept: text/event-stream, decorate the prediction method (run() or predict()) with @cog.streaming (or @streaming if imported directly from cog). The parenthesized forms @cog.streaming() and @streaming() are also accepted. The decorated method must return Iterator[...], AsyncIterator[...], ConcatenateIterator[...], or AsyncConcatenateIterator[...]. Without the decorator, iterator outputs still work in normal JSON responses, but SSE requests return 406 Not Acceptable.

from typing import Iterator
from cog import BaseRunner, Path, streaming

class Runner(BaseRunner):
    @streaming
    def run(self) -> Iterator[Path]:
        done = False
        while not done:
            output_path, done = do_stuff()
            yield Path(output_path)

If you have an async run() method, use AsyncIterator from the typing module:

from typing import AsyncIterator
from cog import BaseRunner, Path, streaming

class Runner(BaseRunner):
    @streaming
    async def run(self) -> AsyncIterator[Path]:
        done = False
        while not done:
            output_path, done = do_stuff()
            yield Path(output_path)

If you're streaming text output, you can use ConcatenateIterator to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings.

from cog import BaseRunner, ConcatenateIterator, streaming

class Runner(BaseRunner):
    @streaming
    def run(self) -> ConcatenateIterator[str]:
        tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
        for token in tokens:
            yield token + " "

Or for async run() methods, use AsyncConcatenateIterator:

from cog import AsyncConcatenateIterator, BaseRunner, streaming

class Runner(BaseRunner):
    @streaming
    async def run(self) -> AsyncConcatenateIterator[str]:
        tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
        for token in tokens:
            yield token + " "

Metrics

You can record custom metrics from your run() function to track model-specific data like token counts, timing breakdowns, or confidence scores. Metrics are included in the response alongside the output.

Recording metrics

Use self.record_metric() inside your run() method:

from cog import BaseRunner

class Runner(BaseRunner):
    def run(self, prompt: str) -> str:
        self.record_metric("temperature", 0.7)
        self.record_metric("token_count", 42)

        result = self.model.generate(prompt)
        return result

For advanced use (dict-style access, deleting metrics), use self.scope:

self.scope.metrics["token_count"] = 42
del self.scope.metrics["token_count"]

Metrics appear in the response metrics field:

{
  "status": "succeeded",
  "output": "...",
  "metrics": {
    "temperature": 0.7,
    "token_count": 42,
    "predict_time": 1.23
  }
}

The predict_time metric is always added automatically by the runtime.

Supported value types are bool, int, float, str, list, and dict. Setting a metric to None deletes it.

Naming rules

Metric names must follow these rules:

  • Each segment must start with a letter (a-z, A-Z) and end with a letter or digit
  • Segments can contain letters, digits, and underscores (_)
  • Segments cannot start or end with underscores
  • Segments cannot contain consecutive underscores (__)
  • Use dots (.) to create nested objects (e.g., timing.inference produces {"timing": {"inference": ...}})
  • Maximum 128 characters total
  • Maximum 4 dot-separated segments
  • Cannot be predict_time (reserved by runtime)
  • Cannot start with cog. (reserved for system metrics)

Valid examples: temperature, token_count, TTFT, T2I_latency, timing.preprocess

Invalid examples: _token, token_, foo__bar, .foo, foo..bar, foo bar, cog.system

Accumulation modes

By default, recording a metric replaces any previous value for that key. You can use accumulation modes to build up values across multiple calls:

# Increment a counter (adds to the existing numeric value)
self.record_metric("token_count", 1, mode="incr")
self.record_metric("token_count", 1, mode="incr")
# Result: {"token_count": 2}

# Append to an array
self.record_metric("steps", "preprocessing", mode="append")
self.record_metric("steps", "inference", mode="append")
# Result: {"steps": ["preprocessing", "inference"]}

# Replace (default behavior)
self.record_metric("status", "running", mode="replace")
self.record_metric("status", "done", mode="replace")
# Result: {"status": "done"}

The mode parameter accepts "replace" (default), "incr", or "append".

Dot-path keys

Use dot-separated keys to create nested objects in the metrics output:

self.record_metric("timing.preprocess", 0.12)
self.record_metric("timing.inference", 0.85)

This produces nested JSON:

{
  "metrics": {
    "timing": {
      "preprocess": 0.12,
      "inference": 0.85
    },
    "predict_time": 1.23
  }
}

Type safety

Once a metric key has been assigned a value of a certain type, it cannot be changed to a different type without deleting it first. This prevents accidental type mismatches when using accumulation modes:

self.record_metric("count", 1)

# This would raise an error — "count" is an int, not a string:
# self.record_metric("count", "oops")

# Delete first, then set with new type:
del self.scope.metrics["count"]
self.record_metric("count", "now a string")

Outside an active run, self.record_metric() and self.scope are silent no-ops — no need for None checks.

Cancellation

When a run is canceled (via the cancel HTTP endpoint or a dropped connection), the Cog runtime interrupts the running run() function. The exception raised depends on whether the runner is sync or async:

Runner typeException raised
Sync (def run)CancelationException
Async (async def run)asyncio.CancelledError

CancelationException

from cog import CancelationException

CancelationException is raised in sync runners when a run is cancelled. It is a BaseException subclass — not an Exception subclass. This means bare except Exception blocks in your run code will not accidentally catch it, matching the behavior of KeyboardInterrupt and asyncio.CancelledError.

You do not need to handle this exception in normal runner code — the runtime manages cancellation automatically. However, if you need to run cleanup logic when a run is cancelled, you can catch it explicitly:

from cog import BaseRunner, CancelationException, Path

class Runner(BaseRunner):
    def run(self, image: Path) -> Path:
        try:
            return self.process(image)
        except CancelationException:
            self.cleanup()
            raise  # always re-raise

[!WARNING] You must re-raise CancelationException after cleanup. Swallowing it will prevent the runtime from marking the run as canceled, and may result in the termination of the container.

CancelationException is available as:

  • cog.CancelationException (recommended)
  • cog.exceptions.CancelationException

For async runners, cancellation follows standard Python async conventions and raises asyncio.CancelledError instead.

Input and output types

Each parameter of the run() method must be annotated with a type. The method's return type must also be annotated.

Primitive types

These types can be used directly as input parameter types and output return types:

TypeDescriptionJSON Schema
strA stringstring
intAn integerinteger
floatA floating-point numbernumber
boolA booleanboolean
cog.PathA path to a file on diskstring (format: uri)
cog.FileA file-like object (deprecated)string (format: uri)
cog.SecretA string containing sensitive informationstring (format: password)

cog.Path

cog.Path is used to get files in and out of models. It represents a path to a file on disk.

cog.Path is a subclass of Python's pathlib.Path and can be used as a drop-in replacement. Any os.PathLike subclass is also accepted as an input type and treated as cog.Path.

For models that return a cog.Path object, the output returned by Cog's built-in HTTP server will be a URL.

This example takes an input file, resizes it, and returns the resized image:

import tempfile
from cog import BaseRunner, Input, Path

class Runner(BaseRunner):
    def run(self, image: Path = Input(description="Image to enlarge")) -> Path:
        upscaled_image = do_some_processing(image)

        # To output cog.Path objects the file needs to exist, so create a temporary file first.
        # This file will automatically be deleted by Cog after it has been returned.
        output_path = Path(tempfile.mkdtemp()) / "upscaled.png"
        upscaled_image.save(output_path)
        return Path(output_path)

cog.File (deprecated)

[!WARNING]
cog.File is deprecated and will be removed in a future version of Cog. Use cog.Path instead.

cog.File represents a file handle. For models that return a cog.File object, the output returned by Cog's built-in HTTP server will be a URL.

from cog import BaseRunner, File, Input
from PIL import Image

class Runner(BaseRunner):
    def run(self, source_image: File = Input(description="Image to enlarge")) -> File:
        pillow_img = Image.open(source_image)
        upscaled_image = do_some_processing(pillow_img)
        return File(upscaled_image)

cog.Secret

cog.Secret signifies that an input holds sensitive information like a password or API token.

cog.Secret redacts its contents in string representations to prevent accidental disclosure. Access the underlying value with get_secret_value().

from cog import BaseRunner, Secret

class Runner(BaseRunner):
    def run(self, api_token: Secret) -> None:
        # Prints '**********'
        print(api_token)

        # Use get_secret_value method to see the secret's content.
        print(api_token.get_secret_value())

A runner's Secret inputs are represented in OpenAPI with the following schema:

{
  "type": "string",
  "format": "password",
  "x-cog-secret": true
}

Models uploaded to Replicate treat secret inputs differently throughout its system. When you create a run on Replicate, any value passed to a Secret input is redacted after being sent to the model.

[!WARNING]
Passing secret values to untrusted models can result in unintended disclosure, exfiltration, or misuse of sensitive data.

Wrapper types

Cog supports wrapper types that modify how a primitive type is treated.

Optional

Use Optional[T] or T | None (Python 3.10+) to mark an input as optional. Optional inputs default to None if not provided.

from typing import Optional
from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        prompt: Optional[str] = Input(description="Input prompt"),
        seed: int | None = Input(description="Random seed", default=None),
    ) -> str:
        if prompt is None:
            return "hello"
        return "hello " + prompt

Prefer Optional[T] or T | None over str = Input(default=None) for inputs that can be None. This lets type checkers warn about error-prone None values:

# Bad: type annotation says str but value can be None
def run(self, prompt: str = Input(default=None)) -> str:
    return "hello" + prompt  # TypeError at runtime if prompt is None

# Good: type annotation matches actual behavior
def run(self, prompt: Optional[str] = Input(description="prompt")) -> str:
    if prompt is None:
        return "hello"
    return "hello " + prompt

[!NOTE] Optional[T] is supported in BaseModel output fields but not as a top-level return type. Use a BaseModel with optional fields instead.

Union

Use A | B or Union[A, B] to accept more than one type for a single input. Cog supports JSON-native union members: str, int, float, bool, dict/Any, list[T], and None.

from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        value: str | float = Input(description="A string or a number"),
    ) -> str:
        return f"{type(value).__name__}:{value}"

At runtime, Cog validates the request against the union and passes the value through as the matching type. For overlapping numeric types, Cog prefers the most specific match (e.g. bool before int, int before float), and a JSON integer is accepted for a float member.

Combine a union with None to make it nullable:

def run(self, value: str | float | None = Input(default=None)) -> str: ...

Union inputs are validated at the HTTP boundary, so unions involving Path, File, Secret, custom coders, and BaseModel are not supported, and the build fails if you use them. Union return types are also unsupported — use a BaseModel output instead.

list

Use list[T] or List[T] to accept or return a list of values. T can be a supported Cog type, but nested container types are not supported.

As an input type:

from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self, paths: list[Path]) -> str:
        output_parts = []
        for path in paths:
            with open(path) as f:
                output_parts.append(f.read())
        return "".join(output_parts)

With cog run, repeat the input name to pass multiple values:

$ echo test1 > 1.txt
$ echo test2 > 2.txt
$ cog run -i paths=@1.txt -i paths=@2.txt

As an output type:

from cog import BaseRunner, Path

class Runner(BaseRunner):
    def run(self) -> list[Path]:
        items = ["foo", "bar", "baz"]
        output = []
        for i, item in enumerate(items):
            out_path = Path(f"/tmp/out-{i}.txt")
            with out_path.open("w") as f:
                f.write(item)
            output.append(out_path)
        return output

Files are named in the format output.<index>.<extension>, e.g. output.0.txt, output.1.txt, output.2.txt.

dict

Use dict to accept or return an opaque JSON object. The value is passed through as-is without type validation.

from cog import BaseRunner, Input

class Runner(BaseRunner):
    def run(self,
        params: dict = Input(description="Arbitrary JSON parameters"),
    ) -> dict:
        return {"greeting": "hello", "params": params}

[!NOTE] dict inputs and outputs are represented as {"type": "object"} in the OpenAPI schema with no additional structure. For structured data with validated fields, use a BaseModel instead.

cog.Opaque

Cog statically analyzes run() type annotations to generate schemas. Some third-party package types, such as vLLM TypedDict definitions, may not be visible to that static analyzer even though they represent JSON-shaped object values at runtime.

Use typing.Annotated with cog.Opaque when you want Cog to accept or return those third-party object values without inspecting their fields:

from typing import Annotated

from cog import BaseRunner, Opaque
from vllm.entrypoints.chat_utils import CustomChatCompletionMessageParam


class Runner(BaseRunner):
    def run(
        self,
        messages: Annotated[list[CustomChatCompletionMessageParam], Opaque],
    ) -> str:
        return str(messages)

Opaque emits an object schema for the wrapped type and preserves the container shape. For example, Annotated[list[T], Opaque] is represented as an array of opaque objects.

Opaque does not inspect, validate, encode, decode, or transform values. It only tells Cog's schema generator to treat the wrapped type as an opaque JSON object. If your type needs custom serialization or deserialization, provide that separately; Opaque only affects schema generation.

Structured output with BaseModel

To return a complex object with multiple typed fields, define a class that inherits from cog.BaseModel or Pydantic's BaseModel and use it as your return type.

Using cog.BaseModel

cog.BaseModel subclasses are automatically converted to Python dataclasses. Define fields using standard type annotations:

from typing import Optional
from cog import BaseRunner, BaseModel, Path

class Output(BaseModel):
    text: str
    confidence: float
    image: Optional[Path]

class Runner(BaseRunner):
    def run(self, prompt: str) -> Output:
        result = self.model.generate(prompt)
        return Output(
            text=result.text,
            confidence=result.score,
            image=None,
        )

The output class can have any name — it does not need to be called Output:

from cog import BaseModel

class SegmentationResult(BaseModel):
    success: bool
    error: Optional[str]
    segmented_image: Optional[Path]

Using Pydantic BaseModel

If you already use Pydantic v2 in your model, you can use a Pydantic BaseModel subclass directly as the output type:

from pydantic import BaseModel as PydanticBaseModel
from cog import BaseRunner

class Result(PydanticBaseModel):
    name: str
    score: float
    tags: list[str]

class Runner(BaseRunner):
    def run(self, prompt: str) -> Result:
        return Result(name="example", score=0.95, tags=["fast", "accurate"])

BaseModel field types

Fields in a BaseModel output support these types:

TypeExample
str, int, float, boolscore: float
cog.Pathimage: Path
cog.Filedata: File (deprecated)
cog.Secrettoken: Secret
Optional[T]error: Optional[str]
list[T]tags: list[str]

Type limitations

The following type patterns are not supported:

  • Nested generics: list[list[str]], list[Optional[str]], Optional[list[str]] are not supported.
  • Output union types beyond Optional: union return types and BaseModel union fields are not supported. Input unions of JSON-native types (str | int, str | float | None, etc.) are supported — see Union.
  • Input unions of non-JSON-native types: input unions involving Path, File, Secret, custom coders, or BaseModel (e.g. Path | str) are not supported and fail at build time.
  • Optional as a top-level return type: -> Optional[str] is not allowed. Use a BaseModel with optional fields instead.
  • Nested BaseModel fields: A BaseModel field typed as another BaseModel is not supported in Cog's type system for schema generation.
  • Tuple, Set, or other collection types: Only list and dict are supported as collection types.

Training interface reference

[!WARNING]
The cog train command is deprecated and will be removed in the next version of Cog. The training API described below may still be used with the HTTP API's /trainings endpoint, but the CLI command is no longer recommended for new projects.

Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fine-tuned models. Real-world examples of this API in use include fine-tuning SDXL with images or fine-tuning Llama 2 with structured text.

How it works

If you've used Cog before, you've probably seen the Runner class, which defines the interface for running your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights.

cog.yaml:

build:
  python_version: "3.13"
train: "train.py:train"

train.py:

from cog import File
import io

def train(param: str) -> File:
    return io.StringIO("hello " + param)

Then you can run it like this:

$ cog train -i param=train
...

$ cat weights
hello train

You can also use classes if you want to run many model trainings and save on setup time. This works the same way as the Runner class with the only difference being the train method.

cog.yaml:

build:
  python_version: "3.13"
train: "train.py:Trainer"

train.py:

from cog import File
import io

class Trainer:
    def setup(self) -> None:
        self.base_model = ... # Load a big base model

    def train(self, param: str) -> File:
        return self.base_model.train(param) # Train on top of a base model

Input(**kwargs)

Use Cog's Input() function to define each of the parameters in your train() function:

from cog import Input, Path

def train(
    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
    seed: int = Input(description="random seed to use for training", default=None)
) -> str:
  return "hello, weights"

The Input() function takes these keyword arguments:

  • description: A description of what to pass to this input for users of the model.
  • default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set to None, the input is optional.
  • ge: For int or float types, the value must be greater than or equal to this number.
  • le: For int or float types, the value must be less than or equal to this number.
  • min_length: For str types, the minimum length of the string.
  • max_length: For str types, the maximum length of the string.
  • regex: For str types, the string must match this regular expression.
  • choices: For str or int types, a list of possible values for this input.

Each parameter of the train() function must be annotated with a type like str, int, float, bool, etc. See Input and output types for the full list of supported types.

Using the Input function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:

def train(self,
  training_data: str = "foo bar", # this is valid
  iterations: int                 # also valid
) -> str:
  # ...

Training Output

Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a TrainingOutput object with multiple fields to return from your train() function, and specify it as the return type for the train function using Python's -> return type annotation:

from cog import BaseModel, Input, Path

class TrainingOutput(BaseModel):
    weights: Path

def train(
    train_data: Path = Input(description="HTTPS URL of a file containing training data"),
    learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
    seed: int = Input(description="random seed to use for training", default=42)
) -> TrainingOutput:
  weights_file = generate_weights("...")
  return TrainingOutput(weights=Path(weights_file))

Testing

If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a COG_WEIGHTS environment variable when running run:

cog run -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK"

Using cog on Windows 11 with WSL 2

Running cog on Windows is now possible thanks to WSL 2. Follow this guide to enable WSL 2 and G

… [truncated — open the raw llms.txt above for the full file]

Related

The AI Toolkit for TypeScript, from the creators of Next.js.

/llms.txt
136,985 tokens
Developer Tools

Meet the modern standard for public facing documentation. Beautiful out of the box, easy to maintain, and optimized for user engagement.

/llms.txt
5,436 tokens
/llms-full.txt
181,290 tokens
Developer Tools

Web development for the rest of us.

/llms.txt
602 tokens
/llms-full.txt
453,623 tokens
Developer Tools

Search through billions of items for similar matches to any object, in milliseconds. It’s the next generation of search, an API call away.

/llms.txt
15,715 tokens
/llms-full.txt
588,629 tokens
Developer Tools

Build and deploy reliable background jobs with no timeouts and no infrastructure to manage.

/llms.txt
12,202 tokens
/llms-full.txt
387,586 tokens
Developer Tools

Get the simple developer experience of SQLite in production, and scale your multi-tenant backend with unlimited databases.

/llms.txt
10,006 tokens
/llms-full.txt
163,317 tokens
Developer Tools

Upstash is a serverless data platform providing low latency and high scalability for real-time applications.

/llms.txt
52,307 tokens
/llms-full.txt
1,200,134 tokens
Developer Tools

One-click deployments built for teams, tuned for Laravel, loaded with tools and goodies you're going to love.

/llms.txt
565 tokens
/llms-full.txt
11,330 tokens
Developer Tools