Cog
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.
Cog: Containers for machine learning
Cog is an open-source tool that lets you package machine learning models in a standard, production-ready container.
You can deploy your packaged model to your own infrastructure, or to Replicate.
Highlights
-
📦 Docker containers without the pain. Writing your own
Dockerfilecan be a bewildering process. With Cog, you define your environment with a simple configuration file and it generates a Docker image with all the best practices: Nvidia base images, efficient caching of dependencies, installing specific Python versions, sensible environment variable defaults, and so on. -
🤬️ No more CUDA hell. Cog knows which CUDA/cuDNN/PyTorch/Tensorflow/Python combos are compatible and will set it all up correctly for you.
-
✅ Define the inputs and outputs for your model with standard Python. Then, Cog generates an OpenAPI schema and validates the inputs and outputs.
-
🎁 Automatic HTTP inference server: Your model's types are used to dynamically generate a RESTful HTTP API using a high-performance Rust/Axum server.
-
🚀 Ready for production. Deploy your model anywhere that Docker images run. Your own infrastructure, or Replicate.
How it works
Define the Docker environment your model runs in with cog.yaml:
build:
gpu: true
system_packages:
- "libgl1"
- "libglib2.0-0"
python_version: "3.13"
python_requirements: requirements.txt
run: "run.py:Runner"
Define how your model runs with run.py:
from cog import BaseRunner, Input, Path
import torch
class Runner(BaseRunner):
def setup(self):
"""Load the model into memory to make running multiple inferences efficient"""
self.model = torch.load("./weights.pth")
# The arguments and types the model takes as input
def run(self,
image: Path = Input(description="Grayscale input image")
) -> Path:
"""Run the model"""
processed_image = preprocess(image)
output = self.model(processed_image)
return postprocess(output)
In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model.
Now, you can run the model:
$ cog run -i image=@input.jpg
--> Building Docker image...
--> Running...
--> Output written to output.jpg
Or, build a Docker image for deployment:
$ cog build -t my-classification-model
--> Building Docker image...
--> Built my-classification-model:latest
$ docker run -d -p 5000:5000 --gpus all my-classification-model
$ curl http://localhost:5000/predictions -X POST \
-H 'Content-Type: application/json' \
-d '{"input": {"image": "https://.../input.jpg"}}'
Or, combine build and run via the serve command:
$ cog serve -p 8080
$ curl http://localhost:8080/predictions -X POST \
-H 'Content-Type: application/json' \
-d '{"input": {"image": "https://.../input.jpg"}}'
Why are we building this?
It's really hard for researchers to ship machine learning models to production.
Part of the solution is Docker, but it is so complex to get it to work: Dockerfiles, pre-/post-processing, Flask servers, CUDA versions. More often than not the researcher has to sit down with an engineer to get the damn thing deployed.
Andreas and Ben created Cog. Andreas used to work at Spotify, where he built tools for building and deploying ML models with Docker. Ben worked at Docker, where he created Docker Compose.
We realized that, in addition to Spotify, other companies were also using Docker to build and deploy machine learning models. Uber and others have built similar systems. So, we're making an open source version so other people can do this too.
Hit us up if you're interested in using it or want to collaborate with us. We're on Discord or email us at team@replicate.com.
Prerequisites
- macOS, Linux or Windows 11. Cog works on macOS, Linux and Windows 11 with WSL 2
- Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog. If you install Docker Engine instead of Docker Desktop, you will need to install Buildx as well.
Install
If you're using macOS, you can install Cog using Homebrew:
brew install replicate/tap/cog
You can also download and install the latest release using our install script:
# bash, zsh, and other shells
sh <(curl -fsSL https://cog.run/install.sh)
# fish shell
sh (curl -fsSL https://cog.run/install.sh | psub)
# download with wget and run in a separate command
wget -qO- https://cog.run/install.sh
sh ./install.sh
You can manually install the latest release of Cog directly from GitHub by running the following commands in a terminal:
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
Or if you are on docker:
RUN sh -c "INSTALL_DIR=\"/usr/local/bin\" SUDO=\"\" $(curl -fsSL https://cog.run/install.sh)"
Upgrade
If you're using macOS and you previously installed Cog with Homebrew, run the following:
brew upgrade replicate/tap/cog
Otherwise, you can upgrade to the latest version by running the same commands you used to install it.
Development
See CONTRIBUTING.md for how to set up a development environment and build from source.
Next steps
- Get started with an example model
- Get started with your own model
- Using Cog with notebooks
- Using Cog with Windows 11
- Take a look at some examples of using Cog
- Deploy models with Cog
cog.yamlreference to learn how to define your model's environment- Run interface reference to learn how the
Runnerinterface works - Training interface reference to learn how to add a fine-tuning API to your model
- HTTP API reference to learn how to use the HTTP API that models serve
Need help?
CLI reference
cog
Containers for machine learning.
To get started, take a look at the documentation: https://github.com/replicate/cog
Examples
To execute a command inside a Docker environment defined with Cog:
$ cog exec echo hello world
Options
--debug Show debugging output
-h, --help help for cog
--no-color Disable colored output
--version Show version of Cog
cog build
Build a Docker image from the cog.yaml in the current directory.
The generated image contains your model code, dependencies, and the Cog runtime. It can be run locally with 'cog run' or pushed to a registry with 'cog push'.
cog build [flags]
Examples
# Build with default settings
cog build
# Build and tag the image
cog build -t my-model:latest
# Build without using the cache
cog build --no-cache
# Build with model weights in a separate layer
cog build --separate-weights -t my-model:v1
Options
-f, --file string The name of the config file. (default "cog.yaml")
-h, --help help for build
--no-cache Do not use cache when building the image
--openapi-schema string Load OpenAPI schema from a file
--progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
--secret stringArray Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
--separate-weights Separate model weights from code in image layers
-t, --tag string A name for the built image in the form 'repository:tag'
--use-cog-base-image Use pre-built Cog base image for faster cold boots (default true)
--use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
cog doctor
Diagnose and fix common issues in your Cog project.
NOTE: cog doctor is experimental. Behavior and checks may change in future versions.
By default, cog doctor reports problems without modifying any files. Pass --fix to automatically apply safe fixes.
cog doctor [flags]
Options
-f, --file string The name of the config file. (default "cog.yaml")
--fix Automatically apply fixes
-h, --help help for doctor
cog exec
Execute a command inside a Docker environment defined by cog.yaml.
Cog builds a temporary image from your cog.yaml configuration and runs the given command inside it. This is useful for debugging, running scripts, or exploring the environment your model will run in.
cog exec <command> [arg...] [flags]
Examples
# Open a Python interpreter inside the model environment
cog exec python
# Run a script
cog exec python train.py
# Run with environment variables
cog exec -e HUGGING_FACE_HUB_TOKEN=abc123 python download.py
# Expose a port (e.g. for Jupyter)
cog exec -p 8888 jupyter notebook
Options
-e, --env stringArray Environment variables, in the form name=value
-f, --file string The name of the config file. (default "cog.yaml")
--gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus.
-h, --help help for exec
--progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
-p, --publish stringArray Publish a container's port to the host, e.g. -p 8000
--use-cog-base-image Use pre-built Cog base image for faster cold boots (default true)
--use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
cog init
Create a cog.yaml and run.py in the current directory.
These files provide a starting template for defining your model's environment and run interface. Edit them to match your model's requirements.
cog init [flags]
Examples
# Set up a new Cog project in the current directory
cog init
Options
-h, --help help for init
cog login
Log in to a container registry.
For Replicate's registry (r8.im), this command handles authentication through Replicate's token-based flow.
For other registries, this command prompts for username and password, then stores credentials using Docker's credential system.
cog login [flags]
Options
-h, --help help for login
--token-stdin Pass login token on stdin instead of opening a browser. You can find your Replicate login token at https://replicate.com/auth/token
cog push
Build a Docker image from cog.yaml and push it to a container registry.
Cog can push to any OCI-compliant registry. When pushing to Replicate's registry (r8.im), run 'cog login' first to authenticate.
cog push [IMAGE] [flags]
Examples
# Push to Replicate
cog push r8.im/your-username/my-model
# Push to any OCI registry
cog push registry.example.com/your-username/model-name
# Push with model weights in a separate layer (Replicate only)
cog push r8.im/your-username/my-model --separate-weights
Options
-f, --file string The name of the config file. (default "cog.yaml")
-h, --help help for push
--no-cache Do not use cache when building the image
--openapi-schema string Load OpenAPI schema from a file
--progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
--secret stringArray Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file'
--separate-weights Separate model weights from code in image layers
--use-cog-base-image Use pre-built Cog base image for faster cold boots (default true)
--use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
cog run
Run the model.
If 'image' is passed, it will run the model on that Docker image. It must be an image that has been built by Cog.
Otherwise, it will build the model in the current directory and run it.
cog run [image] [flags]
Examples
# Run the model with named inputs
cog run -i prompt="a photo of a cat"
# Pass a file as input
cog run -i image=@photo.jpg
# Save output to a file
cog run -i image=@input.jpg -o output.png
# Pass multiple inputs
cog run -i prompt="sunset" -i width=1024 -i height=768
# Run against a pre-built image
cog run r8.im/your-username/my-model -i prompt="hello"
# Pass inputs as JSON
echo '{"prompt": "a cat"}' | cog run --json @-
Options
-e, --env stringArray Environment variables, in the form name=value
-f, --file string The name of the config file. (default "cog.yaml")
--gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus.
-h, --help help for run
-i, --input stringArray Inputs, in the form name=value. if value is prefixed with @, then it is read from a file on disk. E.g. -i path=@image.jpg
--json string Pass inputs as JSON object, read from file (@inputs.json) or via stdin (@-)
-o, --output string Output path
--progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
--setup-timeout uint32 The timeout for a container to setup (in seconds). (default 300)
--use-cog-base-image Use pre-built Cog base image for faster cold boots (default true)
--use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
--use-replicate-token Pass REPLICATE_API_TOKEN from local environment into the model context
cog serve
Run an HTTP server.
Builds the model and starts an HTTP server that exposes the model's inputs and outputs as a REST API. Compatible with the Cog HTTP protocol.
cog serve [flags]
Examples
# Start the server on the default port (8393)
cog serve
# Start on a custom port
cog serve -p 5000
# Test the server
curl http://localhost:8393/predictions \
-X POST \
-H 'Content-Type: application/json' \
-d '{"input": {"prompt": "a cat"}}'
Options
-f, --file string The name of the config file. (default "cog.yaml")
--gpus docker run --gpus GPU devices to add to the container, in the same format as docker run --gpus.
-h, --help help for serve
-p, --port int Port on which to listen (default 8393)
--progress string Set type of build progress output, 'auto' (default), 'tty', 'plain', or 'quiet' (default "auto")
--upload-url string Upload URL for file outputs (e.g. https://example.com/upload/)
--use-cog-base-image Use pre-built Cog base image for faster cold boots (default true)
--use-cuda-base-image string Use Nvidia CUDA base image, 'true' (default) or 'false' (use python base image). False results in a smaller image but may cause problems for non-torch projects (default "auto")
Deploy models with Cog
Cog containers are Docker containers that serve an HTTP server for running your model. You can deploy them anywhere that Docker containers run.
The server inside Cog containers is coglet, a Rust-based inference server that handles HTTP requests, worker process management, and run execution.
This guide assumes you have a model packaged with Cog. If you don't, follow our getting started guide, or use an example model.
Getting started
First, build your model:
cog build -t my-model
You can serve your model locally with cog serve:
cog serve
# or, from a built image:
cog serve my-model
Alternatively, start the Docker container directly:
# If your model uses a CPU:
docker run -d -p 5001:5000 my-model
# If your model uses a GPU:
docker run -d -p 5001:5000 --gpus all my-model
The server listens on port 5000 inside the container (mapped to 5001 above).
To view the OpenAPI schema, open localhost:5001/openapi.json in your browser or use cURL to make a request:
curl http://localhost:5001/openapi.json
To stop the server, run:
docker kill my-model
To run the model,
call the /predictions endpoint,
passing input in the format expected by your model:
curl http://localhost:5001/predictions -X POST \
--header "Content-Type: application/json" \
--data '{"input": {"image": "https://.../input.jpg"}}'
For more details about the HTTP API, see the HTTP API reference documentation.
Health checks
The server exposes a GET /health-check endpoint that returns the current status of the model container. Use this for readiness probes in orchestration systems like Kubernetes.
curl http://localhost:5001/health-check
The response includes a status field with values like STARTING, READY, BUSY, SETUP_FAILED, or DEFUNCT. See the HTTP API reference for full details.
Concurrency
By default, the server processes one run at a time. To enable concurrent runs, set the concurrency.max option in cog.yaml:
concurrency:
max: 4
See the cog.yaml reference for more details.
Environment variables
You can configure runtime behavior with environment variables:
COG_SETUP_TIMEOUT: Maximum time in seconds for thesetup()method (default: no timeout).
See the environment variables reference for the full list.
Environment variables
This reference lists the public Cog-specific environment variables that change how Cog behaves.
Build-time variables
COG_SDK_WHEEL
Controls which Cog Python SDK wheel is installed in the Docker image during cog build. Takes precedence over build.sdk_version in cog.yaml.
Supported values:
| Value | Description |
|---|---|
pypi | Install latest version from PyPI |
pypi:0.12.0 | Install specific version from PyPI |
dist | Use wheel from dist/ directory (requires git repo) |
https://... | Install from URL |
/path/to/wheel.whl | Install from local file path |
Default behaviour:
- Release builds install the latest Cog SDK from PyPI.
- Development builds auto-detect a wheel in
dist/, then fall back to the latest Cog SDK from PyPI.
$ COG_SDK_WHEEL=pypi:0.11.0 cog build
$ COG_SDK_WHEEL=dist cog build
$ COG_SDK_WHEEL=https://example.com/cog-0.12.0-py3-none-any.whl cog build
The dist option searches for wheels in:
./dist/(current directory)$REPO_ROOT/dist/(ifREPO_ROOTis set)<git-repo-root>/dist/(viagit rev-parse, useful when running from subdirectories)
COGLET_WHEEL
Controls which coglet wheel is installed in the Docker image. Coglet is the Rust-based inference server.
Supported values: Same as COG_SDK_WHEEL.
Default behaviour: For development builds, auto-detects a wheel in dist/. For release builds, installs the latest version from PyPI.
$ COGLET_WHEEL=dist cog build
$ COGLET_WHEEL=pypi:0.1.0 cog build
COG_CA_CERT
Injects a custom CA certificate into the Docker image during cog build. This is useful when building behind a corporate proxy or VPN that uses custom certificate authorities (for example, Cloudflare WARP).
Supported values:
| Value | Description |
|---|---|
/path/to/cert.crt | Path to a single PEM certificate file |
/path/to/certs/ | Directory of .crt and .pem files (all are concatenated) |
-----BEGIN CERTIFICATE-----... | Inline PEM certificate |
LS0tLS1CRUdJTi... | Base64-encoded PEM certificate |
The certificate is installed into the system CA store and the SSL_CERT_FILE and REQUESTS_CA_BUNDLE environment variables are set automatically in the built image.
$ COG_CA_CERT=/usr/local/share/ca-certificates/corporate-ca.crt cog build
$ COG_CA_CERT=/etc/custom-certs/ cog build
$ COG_CA_CERT="$(cat /path/to/cert.pem)" cog build
COG_OPENAPI_SCHEMA
Uses a pre-built OpenAPI schema instead of generating one from the configured predict or train reference.
The value must be a path to a JSON schema file. Cog reads that file during schema generation and embeds it in the built image.
$ COG_OPENAPI_SCHEMA=./openapi.json cog build
CLI and local cache variables
COG_NO_UPDATE_CHECK
Disables Cog's automatic update check. Set it to any non-empty value.
$ COG_NO_UPDATE_CHECK=1 cog build
COG_NO_COLOR
Disables coloured CLI output. Set it to any non-empty value.
Cog also honours the standard NO_COLOR environment variable.
$ COG_NO_COLOR=1 cog predict -i prompt="hello"
COG_SKIP_DOCKER_CHECK
Skips the cog doctor Docker environment check. Set it to any non-empty value.
$ COG_SKIP_DOCKER_CHECK=1 cog doctor
COG_CACHE_DIR
Overrides Cog's local cache root.
Cog currently uses this cache for the content-addressed weights store. If unset, Cog uses $XDG_CACHE_HOME/cog when XDG_CACHE_HOME is set, otherwise $HOME/.cache/cog.
$ COG_CACHE_DIR=/mnt/fast-cache cog weights pull
Model reference and registry variables
COG_MODEL
Overrides the full model reference used by commands that need a model destination, such as cog push and weights commands.
The value is parsed as a complete model reference (registry/repo, registry/repo:tag, or registry/repo@digest). If no tag is supplied, Cog generates a timestamp tag.
When COG_MODEL is set, it takes precedence over COG_MODEL_REGISTRY, COG_MODEL_REPO, and COG_MODEL_TAG.
$ COG_MODEL=r8.im/acme/my-model:v1 cog push
COG_MODEL_REGISTRY
Overrides only the registry host of the model reference.
$ COG_MODEL_REGISTRY=registry.example.com cog push
COG_MODEL_REPO
Overrides only the repository path of the model reference. The value must not include a registry host, tag, or digest.
$ COG_MODEL_REPO=acme/my-model cog push
COG_MODEL_TAG
Overrides only the tag of the model reference.
Tags starting with cog- are reserved for tags that Cog generates internally and are rejected.
$ COG_MODEL_TAG=staging cog push
COG_REGISTRY_HOST
Changes the default Replicate-compatible registry host used by commands such as cog login, base image resolution, and model reference resolution.
The default is r8.im.
$ COG_REGISTRY_HOST=registry.example.com cog login
Runtime server variables
These variables affect a running model server. Set them in cog.yaml under environment, pass them with cog predict -e or cog serve -e, or set them when running the built Docker image.
COG_MAX_CONCURRENCY
Controls how many predictions the model server can run concurrently.
By default, Cog runs one prediction at a time. Invalid values are ignored and the default of 1 is used.
$ COG_MAX_CONCURRENCY=4 docker run -p 5000:5000 my-model
COG_SETUP_TIMEOUT
Controls the maximum time, in seconds, allowed for the model's setup() method to complete. If setup exceeds this timeout, the server reports setup failure.
By default, there is no timeout. Set to 0 to disable the timeout. Invalid values are ignored with a warning.
$ COG_SETUP_TIMEOUT=300 docker run -p 5000:5000 my-model
COG_LOG_LEVEL
Controls Coglet runtime log verbosity when RUST_LOG is not set.
Supported values are debug, info, warn, warning, and error. The default is info.
$ COG_LOG_LEVEL=debug docker run -p 5000:5000 my-model
COG_THROTTLE_RESPONSE_INTERVAL
Controls how often asynchronous webhook output and logs events are sent, in seconds.
The default is 0.5 seconds. Invalid values are ignored and the default is used. start and completed webhook events are always sent immediately.
$ COG_THROTTLE_RESPONSE_INTERVAL=1 docker run -p 5000:5000 my-model
COG_STREAM_HISTORY_CAPACITY
Controls how many server-sent event stream events are retained per prediction for replay when a client reconnects with Accept: text/event-stream.
By default, Cog retains the most recent 1024 events per prediction. Set to 0 to disable replay history while keeping live streaming enabled. Invalid values are ignored with a warning and the default is used.
$ COG_STREAM_HISTORY_CAPACITY=0 docker run -p 5000:5000 my-model
$ COG_STREAM_HISTORY_CAPACITY=4096 docker run -p 5000:5000 my-model
COG_WEIGHTS
Provides a weights path or URL to a model whose setup() method accepts a weights parameter.
$ cog run -e COG_WEIGHTS=https://example.com/weights.tar -i prompt="hello"
COG_USER_AGENT
Sets the User-Agent header used by Cog when downloading URL-backed File inputs.
$ COG_USER_AGENT="my-service/1.0" docker run -p 5000:5000 my-model
Push tuning variables
COG_PUSH_OCI
Enables Cog's OCI chunked push path for container image layers when set to 1. If the OCI push fails with a non-fatal error, Cog falls back to Docker's native push path.
$ COG_PUSH_OCI=1 cog push
COG_PUSH_CONCURRENCY
Controls how many image layers or weight blobs Cog uploads concurrently during push operations.
The default is 5. Invalid values and values less than 1 are ignored.
$ COG_PUSH_CONCURRENCY=2 cog push
COG_PUSH_DEFAULT_CHUNK_SIZE
Sets the default multipart upload chunk size, in bytes, when the registry does not advertise a maximum chunk size.
The default is 96 MiB. Invalid values and values less than 1 are ignored.
$ COG_PUSH_DEFAULT_CHUNK_SIZE=67108864 cog push
COG_PUSH_MULTIPART_THRESHOLD
Sets the minimum blob size, in bytes, before Cog uses multipart upload.
The default is 128 MiB. Invalid values and values less than 1 are ignored.
$ COG_PUSH_MULTIPART_THRESHOLD=268435456 cog push
Getting started with your own model
This guide will show you how to put your own machine learning model in a Docker image using Cog. If you haven't got a model to try out, you'll want to follow the main getting started guide.
Prerequisites
- macOS or Linux. Cog works on macOS and Linux, but does not currently support Windows.
- Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog.
Initialization
First, install Cog if you haven't already:
macOS (recommended):
brew install replicate/tap/cog
Linux or macOS (manual):
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
To configure your project for use with Cog, you'll need to add two files:
cog.yamldefines system requirements, Python package dependencies, etcrun.pydescribes the run interface for your model
Use the cog init command to generate these files in your project:
$ cd path/to/your/model
$ cog init
Define the Docker environment
The cog.yaml file defines all the different things that need to be installed for your model to run. You can think of it as a simple way of defining a Docker image.
For example:
build:
python_version: "3.13"
python_requirements: requirements.txt
With a requirements.txt containing your dependencies:
torch==2.6.0
This will generate a Docker image with Python 3.13 and PyTorch 2 installed, for both CPU and GPU, with the correct version of CUDA, and various other sensible best-practices.
To run a command inside this environment, prefix it with cog exec:
$ cog exec python
✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
────────────────────────────────────────────────────────────────────────────────────────
Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
This is handy for ensuring a consistent environment for development or training.
With cog.yaml, you can also install system packages and other things. Take a look at the full reference to see what else you can do.
Define how to run your model
The next step is to update run.py to define the interface for running your model. The run.py generated by cog init looks something like this:
from cog import BaseRunner, Path, Input
import torch
class Runner(BaseRunner):
def setup(self):
"""Load the model into memory to make running multiple inferences efficient"""
self.net = torch.load("weights.pth")
def run(self,
image: Path = Input(description="Image to enlarge"),
scale: float = Input(description="Factor to scale image by", default=1.5)
) -> Path:
"""Run the model"""
# ... pre-processing ...
output = self.net(input)
# ... post-processing ...
return output
Edit your run.py file and fill in the functions with your own model's setup and run code. You might need to import parts of your model from another file.
You also need to define the inputs to your model as arguments to the run() function, as demonstrated above. For each argument, you need to annotate with a type. The supported types are:
str: a stringint: an integerfloat: a floating point numberbool: a booleancog.File: a file-like object representing a file (deprecated — usecog.Pathinstead)cog.Path: a path to a file on disk
You can provide more information about the input with the Input() function, as shown above. It takes these basic arguments:
description: A description of what to pass to this input for users of the modeldefault: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set toNone, the input is optional.ge: Forintorfloattypes, the value should be greater than or equal to this number.le: Forintorfloattypes, the value should be less than or equal to this number.min_length: Forstrtypes, the minimum length of the string.max_length: Forstrtypes, the maximum length of the string.regex: Forstrtypes, the string must match this regular expression.choices: Forstrorinttypes, a list of possible values for this input.deprecated: Mark this input as deprecated with a message explaining what to use instead.
There are some more advanced options you can pass, too. For more details, take a look at the run interface documentation.
Next, add the line run: "run.py:Runner" to your cog.yaml, so it looks something like this:
build:
python_version: "3.13"
python_requirements: requirements.txt
run: "run.py:Runner"
That's it! To test this works, try running the model:
$ cog run -i image=@input.jpg
✓ Building Docker image from cog.yaml... Successfully built 664ef88bc1f4
✓ Model running in Docker image 664ef88bc1f4
Written output to output.png
To pass more inputs to the model, you can add more -i options:
$ cog run -i image=@image.jpg -i scale=2.0
In this case it is just a number, not a file, so you don't need the @ prefix.
Using GPUs
To use GPUs with Cog, add the gpu: true option to the build section of your cog.yaml:
build:
gpu: true
...
Cog will use the nvidia-docker base image and automatically figure out what versions of CUDA and cuDNN to use based on the version of Python, PyTorch, and Tensorflow that you are using.
For more details, see the gpu section of the cog.yaml reference.
Next steps
Next, you might want to take a look at:
- A guide explaining how to deploy a model.
- The reference for
cog.yaml - The reference for the Python library
Getting started
This guide will walk you through what you can do with Cog by using an example model.
[!TIP] Using a language model to help you write the code for your new Cog model?
Feed it https://cog.run/llms.txt, which has all of Cog's documentation bundled into a single file. To learn more about this format, check out llmstxt.org.
Prerequisites
- macOS or Linux. Cog works on macOS and Linux, but does not currently support Windows.
- Docker. Cog uses Docker to create a container for your model. You'll need to install Docker before you can run Cog.
Install Cog
macOS (recommended):
brew install replicate/tap/cog
Linux or macOS (manual):
sudo curl -o /usr/local/bin/cog -L https://github.com/replicate/cog/releases/latest/download/cog_`uname -s`_`uname -m`
sudo chmod +x /usr/local/bin/cog
sudo xattr -d com.apple.quarantine /usr/local/bin/cog 2>/dev/null || true
[!NOTE] macOS: "cannot be opened because the developer cannot be verified"
If you downloaded the binary manually (via
curlor a browser) and see this Gatekeeper warning, run:sudo xattr -d com.apple.quarantine /usr/local/bin/cogInstalling via
brew install replicate/tap/coghandles this automatically.
Create a project
Let's make a directory to work in:
mkdir cog-quickstart
cd cog-quickstart
Run commands
The simplest thing you can do with Cog is run a command inside a Docker environment.
The first thing you need to do is create a file called cog.yaml:
build:
python_version: "3.13"
Then, you can run any command inside this environment. For example, enter
cog exec python
and you'll get an interactive Python shell:
✓ Building Docker image from cog.yaml... Successfully built 8f54020c8981
Running 'python' in Docker with the current directory mounted as a volume...
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Python 3.13.x (main, ...)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
(Hit Ctrl-D to exit the Python shell.)
Inside this Docker environment you can do anything – run a Jupyter notebook, your training script, your evaluation script, and so on.
Run a model
Let's pretend we've trained a model. With Cog, we can define how to run it in a standard way, so other people can easily run it without having to hunt around for a run script.
We need to write some code to describe how the model runs.
Save this to run.py:
import os
os.environ["TORCH_HOME"] = "."
import torch
from cog import BaseRunner, Input, Path
from PIL import Image
from torchvision import models
WEIGHTS = models.ResNet50_Weights.IMAGENET1K_V1
class Runner(BaseRunner):
def setup(self):
"""Load the model into memory to make running multiple inferences efficient"""
self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
self.model = models.resnet50(weights=WEIGHTS).to(self.device)
self.model.eval()
def run(self, image: Path = Input(description="Image to classify")) -> dict:
"""Run the model"""
img = Image.open(image).convert("RGB")
preds = self.model(WEIGHTS.transforms()(img).unsqueeze(0).to(self.device))
top3 = preds[0].softmax(0).topk(3)
categories = WEIGHTS.meta["categories"]
return {categories[i]: p.detach().item() for p, i in zip(*top3)}
We also need to point Cog at this, and tell it what Python dependencies to install.
Save this to requirements.txt:
pillow==11.1.0
torch==2.6.0
torchvision==0.21.0
Then update cog.yaml to look like this:
build:
python_version: "3.13"
python_requirements: requirements.txt
run: "run.py:Runner"
[!TIP] If you have a machine with an NVIDIA GPU attached, add
gpu: trueto thebuildsection of yourcog.yamlto enable GPU acceleration.
Let's grab an image to test the model with:
IMAGE_URL=https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg
curl $IMAGE_URL > input.jpg
Now, let's run the model using Cog:
cog run -i image=@input.jpg
If you see the following output
{
"tiger_cat": 0.4874822497367859,
"tabby": 0.23169134557247162,
"Egyptian_cat": 0.09728282690048218
}
then it worked!
Note: The first time you run cog run, the build process will be triggered to generate a Docker container that can run your model. The next time you run cog run the pre-built container will be used.
Build an image
We can bake your model's code, the trained weights, and the Docker environment into a Docker image. This image serves an HTTP server, and can be deployed to anywhere that Docker runs to serve real-time inference.
cog build -t resnet
# Building Docker image...
# Built resnet:latest
You can run this image with cog run by passing the filename as an argument:
cog run resnet -i image=@input.jpg
Or, you can run it with Docker directly, and it'll serve an HTTP server:
docker run -d --rm -p 5000:5000 resnet
We can send inputs directly with curl:
curl http://localhost:5000/predictions -X POST \
-H 'Content-Type: application/json' \
-d '{"input": {"image": "https://gist.githubusercontent.com/bfirsh/3c2115692682ae260932a67d93fd94a8/raw/56b19f53f7643bb6c0b822c410c366c3a6244de2/mystery.jpg"}}'
As a shorthand, you can add the Docker image's name as an extra line in cog.yaml:
image: "r8.im/replicate/resnet"
Once you've done this, you can use cog push to build and push the image to a Docker registry:
cog push
# Building r8.im/replicate/resnet...
# Pushing r8.im/replicate/resnet...
# Pushed!
The Docker image is now accessible to anyone or any system that has access to this Docker registry.
Next steps
Those are the basics! Next, you might want to take a look at:
- A guide to help you set up your own model on Cog.
- A guide explaining how to deploy a model.
- Reference for
cog.yaml - Reference for the Python library
HTTP API
[!TIP] For information about how to run the HTTP server, see our documentation on deploying models.
When you run a Docker image built by Cog, it serves an HTTP API for making predictions.
The server supports both synchronous and asynchronous prediction creation:
- Synchronous: The server waits until the prediction is completed and responds with the result.
- Asynchronous: The server immediately returns a response and processes the prediction in the background.
The client can create a prediction asynchronously
by setting the Prefer: respond-async header in their request
or by requesting a streamed response with Accept: text/event-stream.
With Prefer: respond-async,
the server responds immediately after starting the prediction
with 202 Accepted status and a prediction object in status starting.
With Accept: text/event-stream,
the server responds with 200 OK and keeps the response open
as a server-sent event stream.
[!NOTE] For JSON responses, the only supported way to receive updates on the status of predictions started asynchronously is using webhooks. Polling for prediction status is not currently supported.
You can also use certain server endpoints to create predictions idempotently, such that if a client calls this endpoint more than once with the same ID (for example, due to a network interruption) while the prediction is still running, no new prediction is created. Instead, the client receives the response type requested by the retry: JSON for regular requests or a server-sent event stream for streaming requests.
Here's a summary of the prediction creation endpoints:
| Endpoint | Header | Behavior |
|---|---|---|
POST /predictions | - | Synchronous, non-idempotent |
POST /predictions | Prefer: respond-async | Asynchronous, non-idempotent |
POST /predictions | Accept: text/event-stream | Streaming, non-idempotent |
PUT /predictions/<prediction_id> | - | Synchronous, idempotent |
PUT /predictions/<prediction_id> | Prefer: respond-async | Asynchronous, idempotent |
PUT /predictions/<prediction_id> | Accept: text/event-stream | Streaming, idempotent |
Choose the endpoint that best fits your needs:
- Use synchronous endpoints when you want to wait for the prediction result.
- Use asynchronous endpoints when you want to start a prediction and receive updates via webhooks.
- Use streaming endpoints when you want to receive prediction lifecycle events over the HTTP response as they happen.
- Use idempotent endpoints when you need to safely retry requests without creating duplicate predictions.
Streaming predictions with server-sent events
To produce streamed prediction events,
the model must return an iterator and opt in to SSE streaming
with the streaming decorator.
from typing import Iterator
from cog import BaseRunner, Input, streaming
class Runner(BaseRunner):
@streaming
def run(self, prompt: str = Input(description="Prompt")) -> Iterator[str]:
for token in generate_tokens(prompt):
yield token
The decorator can also be written as @cog.streaming
or, if imported directly from cog, @streaming.
The parenthesized forms @cog.streaming() and @streaming() are also accepted.
Without the decorator,
iterator outputs still work in normal JSON responses,
but requests with Accept: text/event-stream return 406 Not Acceptable.
To consume a streamed prediction,
send the prediction request with Accept: text/event-stream:
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream
{
"input": {"prompt": "Write a haiku about onions"}
}
The server starts the prediction asynchronously
and keeps the HTTP response open as a server-sent event stream.
Each event has an event name and JSON data payload:
event: start
data: {"id":"abc123","status":"processing"}
event: output
data: {"chunk":"Onions","index":0}
event: output
data: {"chunk":" bloom","index":1}
event: completed
data: {"id":"abc123","status":"succeeded","output":["Onions"," bloom"],"metrics":{"predict_time":0.42}}
Prediction streams can emit these event types:
start: The prediction started processing.output: The model yielded an output chunk. The payload includeschunkandindex.log: The model wrote tostdoutorstderr. The payload includessourceanddata.metric: The model recorded a custom metric. The payload includesname,value, andmode.completed: The prediction reached a terminal state. The payload is the final prediction object, withstatusset tosucceeded,failed, orcanceled.
For command-line clients, use a client that prints the response as data arrives:
curl -N \
-H 'Accept: text/event-stream' \
-H 'Content-Type: application/json' \
-d '{"input":{"prompt":"Write a haiku about onions"}}' \
http://localhost:5000/predictions
For browser clients,
use fetch() or another client that supports request bodies.
The browser EventSource API only supports GET requests,
so it cannot create a prediction with POST /predictions or
PUT /predictions/<prediction_id>.
const response = await fetch("/predictions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "text/event-stream",
},
body: JSON.stringify({ input: { prompt: "Write a haiku about onions" } }),
});
const reader = response.body.pipeThrough(new TextDecoderStream()).getReader();
while (true) {
const { value, done } = await reader.read();
if (done) break;
console.log(value);
}
Use PUT /predictions/<prediction_id> when the client needs safe retries
or wants to reconnect to an in-flight prediction by ID:
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Accept: text/event-stream
{
"input": {"prompt": "Write a haiku about onions"}
}
If the prediction is still running,
the server returns a stream for the existing prediction
instead of creating a duplicate prediction.
If earlier events have been dropped from the replay buffer,
the stream emits an error event and closes.
The replay buffer keeps the most recent 1024 events by default.
Set COG_STREAM_HISTORY_CAPACITY to change this limit,
or set it to 0 to disable replay history while keeping live streaming enabled.
Training endpoints do not support SSE streaming;
requests to /trainings with Accept: text/event-stream
return 406 Not Acceptable.
Webhooks
You can provide a webhook parameter in the client request body
when creating a prediction.
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async
{
"input": {"prompt": "A picture of an onion with sunglasses"},
"webhook": "https://example.com/webhook/prediction"
}
The server makes requests to the provided URL with the current state of the prediction object in the request body at the following times.
start: Once, when the prediction starts (statusisstarting).output: Each time a run function generates an output (either once usingreturnor multiple times usingyield)logs: Each time the run function writes tostdoutcompleted: Once, when the prediction reaches a terminal state (statusissucceeded,canceled, orfailed)
Webhook requests for start and completed event types
are sent immediately.
Webhook requests for output and logs event types
are sent at most once every 500ms.
This interval is not configurable.
By default, the server sends requests for all event types.
Clients can specify which events trigger webhook requests
with the webhook_events_filter parameter in the prediction request body.
For example,
the following request specifies that webhooks are sent by the server
only at the start and end of the prediction:
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async
{
"input": {"prompt": "A picture of an onion with sunglasses"},
"webhook": "https://example.com/webhook/prediction",
"webhook_events_filter": ["start", "completed"]
}
Generating unique prediction IDs
Endpoints for creating and canceling a prediction idempotently
accept a prediction_id parameter in their path.
By default, the server runs one prediction at a time,
but this can be increased with the concurrency.max setting.
When all prediction slots are in use, the server returns 409 Conflict.
The client should ensure prediction slots are available
before creating a new prediction with a different ID.
Clients are responsible for providing unique prediction IDs.
We recommend generating a UUIDv4 or UUIDv7,
base32-encoding that value,
and removing padding characters (==).
This produces a random identifier that is 26 ASCII characters long.
>> from uuid import uuid4
>> from base64 import b32encode
>> b32encode(uuid4().bytes).decode('utf-8').lower().rstrip('=')
'wjx3whax6rf4vphkegkhcvpv6a'
File uploads
A model's run function can produce file output by yielding or returning
a cog.Path or cog.File value.
By default, files are returned as a base64-encoded data URL.
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
{
"input": {"prompt": "A picture of an onion with sunglasses"},
}
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "succeeded",
"output": "data:image/png;base64,..."
}
When creating a prediction synchronously,
the client can configure a base URL to upload output files to instead
by setting the output_file_prefix parameter in the request body:
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
{
"input": {"prompt": "A picture of an onion with sunglasses"},
"output_file_prefix": "https://example.com/upload",
}
When the model produces a file output, the server sends the following request to upload the file to the configured URL:
PUT /upload HTTP/1.1
Host: example.com
Content-Type: multipart/form-data
--boundary
Content-Disposition: form-data; name="file"; filename="image.png"
Content-Type: image/png
<binary data>
--boundary--
If the upload succeeds, the server responds with output:
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "succeeded",
"output": "http://example.com/upload/image.png"
}
If the upload fails, the server responds with an error.
[!IMPORTANT]
File uploads for predictions created asynchronously require--upload-urlto be specified when starting the HTTP server.
Endpoints
GET /
Returns a discovery document listing available API endpoints, the OpenAPI schema URL, and version information.
GET / HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
"cog_version": "0.17.0",
"docs_url": "/docs",
"openapi_url": "/openapi.json",
"shutdown_url": "/shutdown",
"healthcheck_url": "/health-check",
"predictions_url": "/predictions",
"predictions_idempotent_url": "/predictions/{prediction_id}",
"predictions_cancel_url": "/predictions/{prediction_id}/cancel"
}
If training is configured, the response also includes
trainings_url, trainings_idempotent_url, and trainings_cancel_url fields.
GET /health-check
Returns the current health status of the model container.
This endpoint always responds with 200 OK —
check the status field in the response body to determine readiness.
The response body is a JSON object with the following fields:
status: One of the following values:STARTING: The model'ssetup()method is still running.READY: The model is ready to accept predictions.BUSY: The model is ready but all prediction slots are in use.SETUP_FAILED: The model'ssetup()method raised an exception.DEFUNCT: The model encountered an unrecoverable error.UNHEALTHY: The model is ready but a user-definedhealthcheck()method returnedFalse.
setup: Setup phase details (included once setup has started):started_at: ISO 8601 timestamp of when setup began.completed_at: ISO 8601 timestamp of when setup finished (if complete).status: One ofstarting,succeeded, orfailed.logs: Output captured during setup.
version: Runtime version information:coglet: Coglet version.cog: Cog Python SDK version (if available).python: Python version (if available).
user_healthcheck_error: Error message from a user-definedhealthcheck()method (if applicable).
GET /health-check HTTP/1.1
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "READY",
"setup": {
"started_at": "2025-01-01T00:00:00.000000+00:00",
"completed_at": "2025-01-01T00:00:05.000000+00:00",
"status": "succeeded",
"logs": ""
},
"version": {
"coglet": "0.17.0",
"cog": "0.14.0",
"python": "3.13.0"
}
}
GET /openapi.json
The OpenAPI specification of the API, which is derived from the input and output types specified in your model's Predictor and Training objects.
POST /predictions
Makes a single prediction.
The request body is a JSON object with the following fields:
input: A JSON object with the same keys as the arguments to therun()function. AnyFileorPathinputs are passed as URLs.
The response body is a JSON object with the following fields:
status: Eithersucceededorfailed.output: The return value of therun()function.error: Ifstatusisfailed, the error message.metrics: An object containing prediction metrics. Always includespredict_time(elapsed seconds). May also include custom metrics recorded by the model usingself.record_metric().
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
{
"input": {
"image": "https://example.com/image.jpg",
"text": "Hello world!"
}
}
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "succeeded",
"output": "data:image/png;base64,...",
"metrics": {
"predict_time": 4.52
}
}
If the client sets the Prefer: respond-async header in their request,
the server responds immediately after starting the prediction
with 202 Accepted status and a prediction object in status processing.
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async
{
"input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 202 Accepted
Content-Type: application/json
{
"status": "starting",
}
If the client sets the Accept: text/event-stream header,
the server starts the prediction asynchronously and responds with a
server-sent event stream.
See Streaming predictions with server-sent events.
PUT /predictions/<prediction_id>
Make a single prediction.
This is the idempotent version of the POST /predictions endpoint.
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
{
"input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 200 OK
Content-Type: application/json
{
"status": "succeeded",
"output": "data:image/png;base64,..."
}
If the client sets the Prefer: respond-async header in their request,
the server responds immediately after starting the prediction
with 202 Accepted status and a prediction object in status processing.
PUT /predictions/wjx3whax6rf4vphkegkhcvpv6a HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async
{
"input": {"prompt": "A picture of an onion with sunglasses"}
}
HTTP/1.1 202 Accepted
Content-Type: application/json
{
"id": "wjx3whax6rf4vphkegkhcvpv6a",
"status": "starting"
}
If the client sets the Accept: text/event-stream header,
the server starts the prediction asynchronously and responds with a
server-sent event stream.
If a prediction with the same ID is already running,
the server returns a stream for the existing prediction.
See Streaming predictions with server-sent events.
POST /predictions/<prediction_id>/cancel
A client can cancel an asynchronous prediction by making a
POST /predictions/<prediction_id>/cancel request
using the prediction id provided when the prediction was created.
For example, if the client creates a prediction by sending the request:
POST /predictions HTTP/1.1
Content-Type: application/json; charset=utf-8
Prefer: respond-async
{
"id": "abcd1234",
"input": {"prompt": "A picture of an onion with sunglasses"},
}
The client can cancel the prediction by sending the request:
POST /predictions/abcd1234/cancel HTTP/1.1
A prediction cannot be canceled if it's
created synchronously, without the Prefer: respond-async header,
or created without a provided id.
If a prediction exists with the provided id,
the server responds with status 200 OK.
Otherwise, the server responds with status 404 Not Found.
When a prediction is canceled,
Cog raises CancelationException
in sync predictors (or asyncio.CancelledError in async predictors).
This exception may be caught by the model to perform necessary cleanup.
The cleanup should be brief, ideally completing within a few seconds.
After cleanup, the exception must be re-raised using a bare raise statement.
Failure to re-raise the exception may result in the termination of the container.
from cog import BaseRunner, CancelationException, Input, Path
class Runner(BaseRunner):
def run(self, image: Path = Input(description="Image to process")) -> Path:
try:
return self.process(image)
except CancelationException:
self.cleanup()
raise # always re-raise
Notebooks
Cog plays nicely with Jupyter notebooks.
Install the jupyterlab Python package
First, add jupyterlab to your requirements.txt file and reference it in cog.yaml:
requirements.txt:
jupyterlab
cog.yaml:
build:
python_requirements: requirements.txt
Run a notebook
Cog can run notebooks in the environment you've defined in cog.yaml with the following command:
cog exec -p 8888 jupyter lab --allow-root --ip=0.0.0.0
Use notebook code in your runner
You can also import a notebook into your Cog Runner file.
First, export your notebook to a Python file:
jupyter nbconvert --to script my_notebook.ipynb # creates my_notebook.py
Then import the exported Python script into your run.py file. Any functions or variables defined in your notebook will be available to your runner:
from cog import BaseRunner, Input
import my_notebook
class Runner(BaseRunner):
def run(self, prompt: str = Input(description="string prompt")) -> str:
output = my_notebook.do_stuff(prompt)
return output
Private package registry
This guide describes how to build a Docker image with Cog that fetches Python packages from a private registry during setup.
pip.conf
In a directory outside your Cog project, create a pip.conf file with an index-url set to the registry's URL with embedded credentials.
[global]
index-url = https://username:password@my-private-registry.com
Warning Be careful not to commit secrets in Git or include them in Docker images. If your Cog project contains any sensitive files, make sure they're listed in
.gitignoreand.dockerignore.
cog.yaml
In your project's cog.yaml file, add a setup command to run pip install with a secret configuration file mounted to /etc/pip.conf.
build:
run:
- command: pip install
mounts:
- type: secret
id: pip
target: /etc/pip.conf
Build
When building or pushing your model with Cog, pass the --secret option with an id matching the one specified in cog.yaml, along with a path to your local pip.conf file.
$ cog build --secret id=pip,source=/path/to/pip.conf
Using a secret mount allows the private registry credentials to be securely passed to the pip install setup command, without baking them into the Docker image.
Warning If you run
cog buildorcog pushand then change the contents of a secret source file, the cached version of the file will be used on subsequent builds, ignoring any changes you made. To update the contents of the target secret file, either change theidvalue incog.yamland the--secretoption, or pass the--no-cacheoption to bypass the cache entirely.
Run interface reference
This document defines the API of the cog Python module, which is used to define the interface for running your model.
[!TIP] Run
cog initto generate an annotatedrun.pyfile that can be used as a starting point for setting up your model.
[!TIP] Using a language model to help you write the code for your new Cog model?
Feed it https://cog.run/llms.txt, which has all of Cog's documentation bundled into a single file. To learn more about this format, check out llmstxt.org.
Contents
- Contents
BaseRunnerasyncrunners and concurrencyInput(**kwargs)- Output
- Metrics
- Cancellation
- Input and output types
BaseRunner
You define how Cog runs your model by defining a class that inherits from BaseRunner. It looks something like this:
from cog import BaseRunner, Path, Input
import torch
class Runner(BaseRunner):
def setup(self):
"""Load the model into memory to make running multiple inferences efficient"""
self.model = torch.load("weights.pth")
def run(self,
image: Path = Input(description="Image to enlarge"),
scale: float = Input(description="Factor to scale image by", default=1.5)
) -> Path:
"""Run the model"""
# ... pre-processing ...
output = self.model(image)
# ... post-processing ...
return output
Your Runner class should define two methods: setup() and run().
BasePredictor, Predictor, and predict() still work for existing models, but they are deprecated. Cog warns when it loads or inspects those legacy names. Use BaseRunner, Runner, and run() for new code.
Runner.setup()
Prepare the model so multiple runs are efficient.
Use this optional method to include expensive one-off operations like loading trained models, instantiating data transformations, etc.
Many models use this method to download their weights (e.g. using pget). This has some advantages:
- Smaller image sizes
- Faster build times
- Faster pushes and inference on Replicate
However, this may also significantly increase your setup() time.
As an alternative, some choose to store their weights directly in the image. You can simply leave your weights in the directory alongside your cog.yaml and ensure they are not excluded in your .dockerignore file.
While this will increase your image size and build time, it offers other advantages:
- Faster
setup()time - Ensures idempotency and reduces your model's reliance on external systems
- Preserves reproducibility as your model will be self-contained in the image
When using this method, you should use the
--separate-weightsflag oncog buildto store weights in a separate layer.
Runner.run(**kwargs)
Run the model.
This required method is where you call the model that was loaded during setup(), but you may also want to add pre- and post-processing code here.
The run() method takes an arbitrary list of named arguments, where each argument name must correspond to an Input() annotation.
run() can return strings, numbers, cog.Path objects representing files on disk, or lists or dicts of those types. You can also define a custom BaseModel for structured return types. See Input and output types for the full list of supported types.
async runners and concurrency
Added in cog 0.14.0.
You may specify your run() method as async def run(...). In
addition, if you have an async run() function you may also have an async
setup() function:
class Runner(BaseRunner):
async def setup(self) -> None:
print("async setup is also supported...")
async def run(self) -> str:
print("async run");
return "hello world";
Models that have an async run() function can run concurrently, up to the limit specified by concurrency.max in cog.yaml. Attempting to exceed this limit will return a 409 Conflict response.
Input(**kwargs)
Use cog's Input() function to define each of the parameters in your run() method:
class Runner(BaseRunner):
def run(self,
image: Path = Input(description="Image to enlarge"),
scale: float = Input(description="Factor to scale image by", default=1.5, ge=1.0, le=10.0)
) -> Path:
The Input() function takes these keyword arguments:
description: A description of what to pass to this input for users of the model.default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set toNone, the input is optional.ge: Forintorfloattypes, the value must be greater than or equal to this number.le: Forintorfloattypes, the value must be less than or equal to this number.min_length: Forstrtypes, the minimum length of the string.max_length: Forstrtypes, the maximum length of the string.regex: Forstrtypes, the string must match this regular expression.choices: Forstrorinttypes, a list of possible values for this input.deprecated: (optional) If set toTrue, marks this input as deprecated. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. See Deprecating inputs.
Each parameter of the run() method must be annotated with a type like str, int, float, bool, etc. See Input and output types for the full list of supported types.
Using the Input function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
class Runner(BaseRunner):
def run(self,
prompt: str = "default prompt", # this is valid
iterations: int # also valid
) -> str:
# ...
Deprecating inputs
You can mark an input as deprecated by passing deprecated=True to the Input() function. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future.
This is useful when you want to phase out an input without breaking existing clients immediately:
from cog import BaseRunner, Input
class Runner(BaseRunner):
def run(self,
text: str = Input(description="Some deprecated text", deprecated=True),
prompt: str = Input(description="Prompt for the model")
) -> str:
# ...
return prompt
Output
Cog runners can return a simple data type like a string, number, float, or boolean. Use Python's -> <type> syntax to annotate the return type.
Here's an example of a runner that returns a string:
from cog import BaseRunner
class Runner(BaseRunner):
def run(self) -> str:
return "hello"
Returning an object
To return a complex object with multiple values, define an Output object with multiple fields to return from your run() method:
from cog import BaseRunner, BaseModel, File
class Output(BaseModel):
file: File
text: str
class Runner(BaseRunner):
def run(self) -> Output:
return Output(text="hello", file=io.StringIO("hello"))
Each of the output object's properties must be one of the supported output types. For the full list, see Input and output types.
Returning a list
The run() method can return a list of any of the supported output types. Here's an example that outputs multiple files:
from cog import BaseRunner, Path
class Runner(BaseRunner):
def run(self) -> list[Path]:
items = ["foo", "bar", "baz"]
output = []
for i, item in enumerate(items):
out_path = Path(f"/tmp/out-{i}.txt")
with out_path.open("w") as f:
f.write(item)
output.append(out_path)
return output
Files are named in the format output.<index>.<extension>, e.g. output.0.txt, output.1.txt, and output.2.txt from the example above.
Optional properties
To conditionally omit properties from the Output object, define them using typing.Optional:
from cog import BaseModel, BaseRunner, Path
from typing import Optional
class Output(BaseModel):
score: Optional[float]
file: Optional[Path]
class Runner(BaseRunner):
def run(self) -> Output:
if condition:
return Output(score=1.5)
else:
return Output(file=io.StringIO("hello"))
Streaming output
Cog models can stream output as the run() method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated.
To support streaming output in your Cog model, add from typing import Iterator to your run.py file. The typing package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the run() method in the form -> Iterator[<type>] where <type> can be one of str, int, float, bool, or cog.Path.
To allow clients to receive chunks as server-sent events with Accept: text/event-stream, decorate the prediction method (run() or predict()) with @cog.streaming (or @streaming if imported directly from cog). The parenthesized forms @cog.streaming() and @streaming() are also accepted. The decorated method must return Iterator[...], AsyncIterator[...], ConcatenateIterator[...], or AsyncConcatenateIterator[...]. Without the decorator, iterator outputs still work in normal JSON responses, but SSE requests return 406 Not Acceptable.
from typing import Iterator
from cog import BaseRunner, Path, streaming
class Runner(BaseRunner):
@streaming
def run(self) -> Iterator[Path]:
done = False
while not done:
output_path, done = do_stuff()
yield Path(output_path)
If you have an async run() method, use AsyncIterator from the typing module:
from typing import AsyncIterator
from cog import BaseRunner, Path, streaming
class Runner(BaseRunner):
@streaming
async def run(self) -> AsyncIterator[Path]:
done = False
while not done:
output_path, done = do_stuff()
yield Path(output_path)
If you're streaming text output, you can use ConcatenateIterator to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings.
from cog import BaseRunner, ConcatenateIterator, streaming
class Runner(BaseRunner):
@streaming
def run(self) -> ConcatenateIterator[str]:
tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
for token in tokens:
yield token + " "
Or for async run() methods, use AsyncConcatenateIterator:
from cog import AsyncConcatenateIterator, BaseRunner, streaming
class Runner(BaseRunner):
@streaming
async def run(self) -> AsyncConcatenateIterator[str]:
tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]
for token in tokens:
yield token + " "
Metrics
You can record custom metrics from your run() function to track model-specific data like token counts, timing breakdowns, or confidence scores. Metrics are included in the response alongside the output.
Recording metrics
Use self.record_metric() inside your run() method:
from cog import BaseRunner
class Runner(BaseRunner):
def run(self, prompt: str) -> str:
self.record_metric("temperature", 0.7)
self.record_metric("token_count", 42)
result = self.model.generate(prompt)
return result
For advanced use (dict-style access, deleting metrics), use self.scope:
self.scope.metrics["token_count"] = 42
del self.scope.metrics["token_count"]
Metrics appear in the response metrics field:
{
"status": "succeeded",
"output": "...",
"metrics": {
"temperature": 0.7,
"token_count": 42,
"predict_time": 1.23
}
}
The predict_time metric is always added automatically by the runtime.
Supported value types are bool, int, float, str, list, and dict. Setting a metric to None deletes it.
Naming rules
Metric names must follow these rules:
- Each segment must start with a letter (
a-z,A-Z) and end with a letter or digit - Segments can contain letters, digits, and underscores (
_) - Segments cannot start or end with underscores
- Segments cannot contain consecutive underscores (
__) - Use dots (
.) to create nested objects (e.g.,timing.inferenceproduces{"timing": {"inference": ...}}) - Maximum 128 characters total
- Maximum 4 dot-separated segments
- Cannot be
predict_time(reserved by runtime) - Cannot start with
cog.(reserved for system metrics)
Valid examples: temperature, token_count, TTFT, T2I_latency, timing.preprocess
Invalid examples: _token, token_, foo__bar, .foo, foo..bar, foo bar, cog.system
Accumulation modes
By default, recording a metric replaces any previous value for that key. You can use accumulation modes to build up values across multiple calls:
# Increment a counter (adds to the existing numeric value)
self.record_metric("token_count", 1, mode="incr")
self.record_metric("token_count", 1, mode="incr")
# Result: {"token_count": 2}
# Append to an array
self.record_metric("steps", "preprocessing", mode="append")
self.record_metric("steps", "inference", mode="append")
# Result: {"steps": ["preprocessing", "inference"]}
# Replace (default behavior)
self.record_metric("status", "running", mode="replace")
self.record_metric("status", "done", mode="replace")
# Result: {"status": "done"}
The mode parameter accepts "replace" (default), "incr", or "append".
Dot-path keys
Use dot-separated keys to create nested objects in the metrics output:
self.record_metric("timing.preprocess", 0.12)
self.record_metric("timing.inference", 0.85)
This produces nested JSON:
{
"metrics": {
"timing": {
"preprocess": 0.12,
"inference": 0.85
},
"predict_time": 1.23
}
}
Type safety
Once a metric key has been assigned a value of a certain type, it cannot be changed to a different type without deleting it first. This prevents accidental type mismatches when using accumulation modes:
self.record_metric("count", 1)
# This would raise an error — "count" is an int, not a string:
# self.record_metric("count", "oops")
# Delete first, then set with new type:
del self.scope.metrics["count"]
self.record_metric("count", "now a string")
Outside an active run, self.record_metric() and self.scope are silent no-ops — no need for None checks.
Cancellation
When a run is canceled (via the cancel HTTP endpoint or a dropped connection), the Cog runtime interrupts the running run() function. The exception raised depends on whether the runner is sync or async:
| Runner type | Exception raised |
|---|---|
Sync (def run) | CancelationException |
Async (async def run) | asyncio.CancelledError |
CancelationException
from cog import CancelationException
CancelationException is raised in sync runners when a run is cancelled. It is a BaseException subclass — not an Exception subclass. This means bare except Exception blocks in your run code will not accidentally catch it, matching the behavior of KeyboardInterrupt and asyncio.CancelledError.
You do not need to handle this exception in normal runner code — the runtime manages cancellation automatically. However, if you need to run cleanup logic when a run is cancelled, you can catch it explicitly:
from cog import BaseRunner, CancelationException, Path
class Runner(BaseRunner):
def run(self, image: Path) -> Path:
try:
return self.process(image)
except CancelationException:
self.cleanup()
raise # always re-raise
[!WARNING] You must re-raise
CancelationExceptionafter cleanup. Swallowing it will prevent the runtime from marking the run as canceled, and may result in the termination of the container.
CancelationException is available as:
cog.CancelationException(recommended)cog.exceptions.CancelationException
For async runners, cancellation follows standard Python async conventions and raises asyncio.CancelledError instead.
Input and output types
Each parameter of the run() method must be annotated with a type. The method's return type must also be annotated.
Primitive types
These types can be used directly as input parameter types and output return types:
| Type | Description | JSON Schema |
|---|---|---|
str | A string | string |
int | An integer | integer |
float | A floating-point number | number |
bool | A boolean | boolean |
cog.Path | A path to a file on disk | string (format: uri) |
cog.File | A file-like object (deprecated) | string (format: uri) |
cog.Secret | A string containing sensitive information | string (format: password) |
cog.Path
cog.Path is used to get files in and out of models. It represents a path to a file on disk.
cog.Path is a subclass of Python's pathlib.Path and can be used as a drop-in replacement. Any os.PathLike subclass is also accepted as an input type and treated as cog.Path.
For models that return a cog.Path object, the output returned by Cog's built-in HTTP server will be a URL.
This example takes an input file, resizes it, and returns the resized image:
import tempfile
from cog import BaseRunner, Input, Path
class Runner(BaseRunner):
def run(self, image: Path = Input(description="Image to enlarge")) -> Path:
upscaled_image = do_some_processing(image)
# To output cog.Path objects the file needs to exist, so create a temporary file first.
# This file will automatically be deleted by Cog after it has been returned.
output_path = Path(tempfile.mkdtemp()) / "upscaled.png"
upscaled_image.save(output_path)
return Path(output_path)
cog.File (deprecated)
[!WARNING]
cog.Fileis deprecated and will be removed in a future version of Cog. Usecog.Pathinstead.
cog.File represents a file handle. For models that return a cog.File object, the output returned by Cog's built-in HTTP server will be a URL.
from cog import BaseRunner, File, Input
from PIL import Image
class Runner(BaseRunner):
def run(self, source_image: File = Input(description="Image to enlarge")) -> File:
pillow_img = Image.open(source_image)
upscaled_image = do_some_processing(pillow_img)
return File(upscaled_image)
cog.Secret
cog.Secret signifies that an input holds sensitive information like a password or API token.
cog.Secret redacts its contents in string representations to prevent accidental disclosure. Access the underlying value with get_secret_value().
from cog import BaseRunner, Secret
class Runner(BaseRunner):
def run(self, api_token: Secret) -> None:
# Prints '**********'
print(api_token)
# Use get_secret_value method to see the secret's content.
print(api_token.get_secret_value())
A runner's Secret inputs are represented in OpenAPI with the following schema:
{
"type": "string",
"format": "password",
"x-cog-secret": true
}
Models uploaded to Replicate treat secret inputs differently throughout its system. When you create a run on Replicate, any value passed to a Secret input is redacted after being sent to the model.
[!WARNING]
Passing secret values to untrusted models can result in unintended disclosure, exfiltration, or misuse of sensitive data.
Wrapper types
Cog supports wrapper types that modify how a primitive type is treated.
Optional
Use Optional[T] or T | None (Python 3.10+) to mark an input as optional. Optional inputs default to None if not provided.
from typing import Optional
from cog import BaseRunner, Input
class Runner(BaseRunner):
def run(self,
prompt: Optional[str] = Input(description="Input prompt"),
seed: int | None = Input(description="Random seed", default=None),
) -> str:
if prompt is None:
return "hello"
return "hello " + prompt
Prefer Optional[T] or T | None over str = Input(default=None) for inputs that can be None. This lets type checkers warn about error-prone None values:
# Bad: type annotation says str but value can be None
def run(self, prompt: str = Input(default=None)) -> str:
return "hello" + prompt # TypeError at runtime if prompt is None
# Good: type annotation matches actual behavior
def run(self, prompt: Optional[str] = Input(description="prompt")) -> str:
if prompt is None:
return "hello"
return "hello " + prompt
[!NOTE]
Optional[T]is supported inBaseModeloutput fields but not as a top-level return type. Use aBaseModelwith optional fields instead.
Union
Use A | B or Union[A, B] to accept more than one type for a single input. Cog supports JSON-native union members: str, int, float, bool, dict/Any, list[T], and None.
from cog import BaseRunner, Input
class Runner(BaseRunner):
def run(self,
value: str | float = Input(description="A string or a number"),
) -> str:
return f"{type(value).__name__}:{value}"
At runtime, Cog validates the request against the union and passes the value through as the matching type. For overlapping numeric types, Cog prefers the most specific match (e.g. bool before int, int before float), and a JSON integer is accepted for a float member.
Combine a union with None to make it nullable:
def run(self, value: str | float | None = Input(default=None)) -> str: ...
Union inputs are validated at the HTTP boundary, so unions involving Path, File, Secret, custom coders, and BaseModel are not supported, and the build fails if you use them. Union return types are also unsupported — use a BaseModel output instead.
list
Use list[T] or List[T] to accept or return a list of values. T can be a supported Cog type, but nested container types are not supported.
As an input type:
from cog import BaseRunner, Path
class Runner(BaseRunner):
def run(self, paths: list[Path]) -> str:
output_parts = []
for path in paths:
with open(path) as f:
output_parts.append(f.read())
return "".join(output_parts)
With cog run, repeat the input name to pass multiple values:
$ echo test1 > 1.txt
$ echo test2 > 2.txt
$ cog run -i paths=@1.txt -i paths=@2.txt
As an output type:
from cog import BaseRunner, Path
class Runner(BaseRunner):
def run(self) -> list[Path]:
items = ["foo", "bar", "baz"]
output = []
for i, item in enumerate(items):
out_path = Path(f"/tmp/out-{i}.txt")
with out_path.open("w") as f:
f.write(item)
output.append(out_path)
return output
Files are named in the format output.<index>.<extension>, e.g. output.0.txt, output.1.txt, output.2.txt.
dict
Use dict to accept or return an opaque JSON object. The value is passed through as-is without type validation.
from cog import BaseRunner, Input
class Runner(BaseRunner):
def run(self,
params: dict = Input(description="Arbitrary JSON parameters"),
) -> dict:
return {"greeting": "hello", "params": params}
[!NOTE]
dictinputs and outputs are represented as{"type": "object"}in the OpenAPI schema with no additional structure. For structured data with validated fields, use aBaseModelinstead.
cog.Opaque
Cog statically analyzes run() type annotations to generate schemas. Some third-party package types, such as vLLM TypedDict definitions, may not be visible to that static analyzer even though they represent JSON-shaped object values at runtime.
Use typing.Annotated with cog.Opaque when you want Cog to accept or return those third-party object values without inspecting their fields:
from typing import Annotated
from cog import BaseRunner, Opaque
from vllm.entrypoints.chat_utils import CustomChatCompletionMessageParam
class Runner(BaseRunner):
def run(
self,
messages: Annotated[list[CustomChatCompletionMessageParam], Opaque],
) -> str:
return str(messages)
Opaque emits an object schema for the wrapped type and preserves the container shape. For example, Annotated[list[T], Opaque] is represented as an array of opaque objects.
Opaque does not inspect, validate, encode, decode, or transform values. It only tells Cog's schema generator to treat the wrapped type as an opaque JSON object. If your type needs custom serialization or deserialization, provide that separately; Opaque only affects schema generation.
Structured output with BaseModel
To return a complex object with multiple typed fields, define a class that inherits from cog.BaseModel or Pydantic's BaseModel and use it as your return type.
Using cog.BaseModel
cog.BaseModel subclasses are automatically converted to Python dataclasses. Define fields using standard type annotations:
from typing import Optional
from cog import BaseRunner, BaseModel, Path
class Output(BaseModel):
text: str
confidence: float
image: Optional[Path]
class Runner(BaseRunner):
def run(self, prompt: str) -> Output:
result = self.model.generate(prompt)
return Output(
text=result.text,
confidence=result.score,
image=None,
)
The output class can have any name — it does not need to be called Output:
from cog import BaseModel
class SegmentationResult(BaseModel):
success: bool
error: Optional[str]
segmented_image: Optional[Path]
Using Pydantic BaseModel
If you already use Pydantic v2 in your model, you can use a Pydantic BaseModel subclass directly as the output type:
from pydantic import BaseModel as PydanticBaseModel
from cog import BaseRunner
class Result(PydanticBaseModel):
name: str
score: float
tags: list[str]
class Runner(BaseRunner):
def run(self, prompt: str) -> Result:
return Result(name="example", score=0.95, tags=["fast", "accurate"])
BaseModel field types
Fields in a BaseModel output support these types:
| Type | Example |
|---|---|
str, int, float, bool | score: float |
cog.Path | image: Path |
cog.File | data: File (deprecated) |
cog.Secret | token: Secret |
Optional[T] | error: Optional[str] |
list[T] | tags: list[str] |
Type limitations
The following type patterns are not supported:
- Nested generics:
list[list[str]],list[Optional[str]],Optional[list[str]]are not supported. - Output union types beyond Optional: union return types and
BaseModelunion fields are not supported. Input unions of JSON-native types (str | int,str | float | None, etc.) are supported — seeUnion. - Input unions of non-JSON-native types: input unions involving
Path,File,Secret, custom coders, orBaseModel(e.g.Path | str) are not supported and fail at build time. Optionalas a top-level return type:-> Optional[str]is not allowed. Use aBaseModelwith optional fields instead.- Nested
BaseModelfields: ABaseModelfield typed as anotherBaseModelis not supported in Cog's type system for schema generation. - Tuple, Set, or other collection types: Only
listanddictare supported as collection types.
Training interface reference
[!WARNING]
Thecog traincommand is deprecated and will be removed in the next version of Cog. The training API described below may still be used with the HTTP API's/trainingsendpoint, but the CLI command is no longer recommended for new projects.
Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fine-tuned models. Real-world examples of this API in use include fine-tuning SDXL with images or fine-tuning Llama 2 with structured text.
How it works
If you've used Cog before, you've probably seen the Runner class, which defines the interface for running your model. Cog's training API works similarly: You define a Python function that describes the inputs and outputs of the training process. The inputs are things like training data, epochs, batch size, seed, etc. The output is typically a file with the fine-tuned weights.
cog.yaml:
build:
python_version: "3.13"
train: "train.py:train"
train.py:
from cog import File
import io
def train(param: str) -> File:
return io.StringIO("hello " + param)
Then you can run it like this:
$ cog train -i param=train
...
$ cat weights
hello train
You can also use classes if you want to run many model trainings and save on setup time. This works the same way as the Runner class with the only difference being the train method.
cog.yaml:
build:
python_version: "3.13"
train: "train.py:Trainer"
train.py:
from cog import File
import io
class Trainer:
def setup(self) -> None:
self.base_model = ... # Load a big base model
def train(self, param: str) -> File:
return self.base_model.train(param) # Train on top of a base model
Input(**kwargs)
Use Cog's Input() function to define each of the parameters in your train() function:
from cog import Input, Path
def train(
train_data: Path = Input(description="HTTPS URL of a file containing training data"),
learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
seed: int = Input(description="random seed to use for training", default=None)
) -> str:
return "hello, weights"
The Input() function takes these keyword arguments:
description: A description of what to pass to this input for users of the model.default: A default value to set the input to. If this argument is not passed, the input is required. If it is explicitly set toNone, the input is optional.ge: Forintorfloattypes, the value must be greater than or equal to this number.le: Forintorfloattypes, the value must be less than or equal to this number.min_length: Forstrtypes, the minimum length of the string.max_length: Forstrtypes, the maximum length of the string.regex: Forstrtypes, the string must match this regular expression.choices: Forstrorinttypes, a list of possible values for this input.
Each parameter of the train() function must be annotated with a type like str, int, float, bool, etc. See Input and output types for the full list of supported types.
Using the Input function provides better documentation and validation constraints to the users of your model, but it is not strictly required. You can also specify default values for your parameters using plain Python, or omit default assignment entirely:
def train(self,
training_data: str = "foo bar", # this is valid
iterations: int # also valid
) -> str:
# ...
Training Output
Training output is typically a binary weights file. To return a custom output object or a complex object with multiple values, define a TrainingOutput object with multiple fields to return from your train() function, and specify it as the return type for the train function using Python's -> return type annotation:
from cog import BaseModel, Input, Path
class TrainingOutput(BaseModel):
weights: Path
def train(
train_data: Path = Input(description="HTTPS URL of a file containing training data"),
learning_rate: float = Input(description="learning rate, for learning!", default=1e-4, ge=0),
seed: int = Input(description="random seed to use for training", default=42)
) -> TrainingOutput:
weights_file = generate_weights("...")
return TrainingOutput(weights=Path(weights_file))
Testing
If you are doing development of a Cog model like Llama or SDXL, you can test that the fine-tuned code path works before pushing by specifying a COG_WEIGHTS environment variable when running run:
cog run -e COG_WEIGHTS=https://replicate.delivery/pbxt/xyz/weights.tar -i prompt="a photo of TOK"
Using cog on Windows 11 with WSL 2
- 0. Prerequisites
- 1. Install the GPU driver
- 2. Unlocking features
- 3. Update MS Linux kernel
- 4. Configure WSL 2
- 5. Configure CUDA WSL-Ubuntu Toolkit
- 6. Install Docker
- 7. Install
cogand pull an image - 8. Run a model in WSL 2
- 9. References
Running cog on Windows is now possible thanks to WSL 2. Follow this guide to enable WSL 2 and G
… [truncated — open the raw llms.txt above for the full file]
Meet the modern standard for public facing documentation. Beautiful out of the box, easy to maintain, and optimized for user engagement.
Search through billions of items for similar matches to any object, in milliseconds. It’s the next generation of search, an API call away.
Build and deploy reliable background jobs with no timeouts and no infrastructure to manage.
Get the simple developer experience of SQLite in production, and scale your multi-tenant backend with unlimited databases.
Upstash is a serverless data platform providing low latency and high scalability for real-time applications.
One-click deployments built for teams, tuned for Laravel, loaded with tools and goodies you're going to love.